Ms Swapnil Shrivastava is working as Principal Technical Officer in…
Due to emerging business requirements and technological evolution, Enterprise Applications are slowly getting transformed into Enterprise Service Ecosystems. The conventional Enterprise Applications are complex, scalable, component-based, distributed and mission critical that are designed to operate in a business setting. They include Workflow Management System, Enterprise Resource Planning and Customer Relationship Management. Whereas, Enterprise Service Ecosystem is an environment that comprises of inter-organization, loosely coupled, distributed and heterogeneous components which orchestrate amongst themselves in a seamless manner.
Aadhaar Authentication Ecosystem is one of them. The key distinguishing factor between the two is that the latter comprises of inter-organization components i.e. they exist beyond organizational boundaries. These ecosystem components coordinate amongst themselves for a single objective; however, they are managed and operated by different ecosystem partners. Each component record the relevant events in a specific structure for audit purpose that is called as audit logs or event logs.
We believe that getting comprehensive view of events in such an environment for bottleneck detection or process conformance is a challenging task and cannot be effectively performed by existing tools. We considered Aadhaar Authentication Ecosystem in order to demonstrate limitations of open source Process Mining tools for this requirement.
Aadhaar Authentication is an online process wherein a resident’s Aadhaar Number and Personal Identity Data (demographic and/or biometric) are submitted using Aadhaar Authentication Service. The service would perform 1:1 match and return “yes/no” as a response. The Aadhaar Authentication Service ecosystem is as shown in figure 1 and is described as follows: Aadhaar Holder is the resident with a valid Aadhaar number. Central Identity Data Repository that contains the identity information of all Aadhaar holders.
Authentication Service Provider (AuSP) offers Aadhaar based authentication service on behalf of UIDAI. Authentication Service Agency (ASA) provides secure network connectivity with the CIDR. Authentication User Agency (AUA) uses Aadhaar authentication to enable its services or transmit authentication requests from Sub AUAs to ASA. Sub AUAs access authentication service through an existing AUA. Authentication device represented by “D” in the diagram collect identity information, prepare the information for transmission, transmit the authentication packets through AUA/Sub AUA and receive the authentication results from them.
Each of these components record events of authentication request and response in audit logs depending upon the activity performed by them. The event data comprises of Aadhaar Number, device id, subaua/aua/asa code, type of authentication, response, various timestamp and so on. The generation of comprehensive view spanning across ecosystem components is an unattended task. Hence there is a strong need to capture all the relevant event data from ecosystem components, correlate multiple information sources into a coherent view to identify bottleneck and process conformance in a timely manner. We explored existing open source Process Mining tools for addressing this requirement.
Process mining aims to discover, monitor and improve real processes by extracting knowledge from event logs readily available in today’s information systems. It aims at the automatic construction of models explaining the behavior observed in the event log. The major use cases of Process Mining are Process Discovery, Process Conformance Checking and Process Enhancements. The two most popular open source Process Mining tools are ProM and Disco. ProM is open source Process Mining toolset maintained by Eindhoven University of Technology.
This tool mainly caters to the academic and research group. The plugins added on demand enables to solve complex process exploration. It imports event logs compliant with the MXML or XES formats and can load process model definitions in different standards. Some of the main features of ProM are: discovering the control-flow perspective of a process, social network analysis, analyzing the resource and performance perspective of a process, discovering events based on decision rules and conformance checking with a variety of algorithms. ProM provides several export formats such as CSV and PNG.
Disco is a commercial process mining tool developed by Fluxicon. It supports a wide range of event log import formats including CSV, MS Excel, MXML, XES, and FXL Disco Logs and DSC Disco project files. Some of the features include automated process discovery, animation of process maps, event log filtering with various parameters, project management and detailed statistics. Disco is fully compatible with ProM.
In Enterprise Service Ecosystem, specified activities are performed by ecosystem components and the corresponding event data is recorded. The event data comprises of actor who execute or initiate the activity, the timestamp of the event, the performed activity, location and other related data. In order to get comprehensive view of the ecosystem, event data should be selected from components, diffused and then analyzed.
Some hidden patterns such as malicious activities could be found by analyzing relationships between actors and events of the ecosystem components. Process Mining is a scientific discipline which deals with process models and event data. Since event data appears to be a common factor, we investigated the effectiveness of Process Mining techniques in ecosystems. We studied applicability of two popular open source Process Mining tools viz: Prom and Disco.
We made the following observations based on the functionalities offered by these tools and the Enterprise Service Ecosystem requirement.
- It was found that both the tools support offline analytics i.e. they import the data in a specified format from a file. The event data captured in databases which may be a commonly used data store in ecosystem components could not be direct input to these tools. The data from database should be transformed in specified format and stored in form of file. This file would then be imported into the tool.
- It was also noted that both Disco and ProM take the input data from a single data source. The ecosystem comprises of distributed components which coordinate amongst themselves for performing certain goal. Hence in order to access the event data spread across ecosystem component, the data from event logs stored in multiple sources should be extracted, transformed into specified format and merged into a single file to be fed as input to these tools as shown in the figure 2.
- One more prominent feature of these tools is that they import data from file and hence support static or offline analytics. However it is required that the fraud detection in Enterprise Service Ecosystem should be done in timely manner. Both Disco and ProM don’t provide functionality to detect any kind of service/process violation in real time.
- Both Disco and ProM don’t provide API support. Any such support in ProM or Disco would have enabled integration of Process Mining functionalities in client applications
- Disco provides only 100 data log entries to be used in unlicensed version, Hence Disco license for full access involve commercial.
Set of challenges on similar lines were identified for Process Mining support in Big Data scenario. In today’s Big Data scenario, Process Mining could become helpful for analysts to transform event data into actionable information, but, a comprehensive data analysis is required. Even though they are now increasingly used in commercial settings, many of the developed algorithms are designed to work in a static fashion, but not easily applicable for processing real time event streams. In such a scenario it is important to define suitable Process Mining methodology to handle online data and to collect and save them into data stores.
In our exploratory work, we identified that the existing Process Mining tools could import the event data in a specific file format and from a centralized location in offline manner. These tools cannot be utilized in their present form for Enterprise Service Ecosystem which have distributed event data and require service/process violation information in real time.
Register for our upcoming events:
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- WEBINAR: HOW TO BEGIN A CAREER IN DATA SCIENCE | 24th Oct
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Provide your comments below
What's Your Reaction?
Ms Swapnil Shrivastava is working as Principal Technical Officer in Big Data Analytics group at C-DAC, Bangalore. She worked in various capacities and at different locations of C-DAC on research projects, Mission Mode Projects, turnkey projects and provided technical consultancy. She has few international publications to her credit. She also contributed in various professional courses and workshops.