Now Reading
Guide to PM4Py: Python Framework for Process Mining Algorithms

Guide to PM4Py: Python Framework for Process Mining Algorithms

Processes are all around us. Any series of tasks that together achieve an objective can be called a Process. Thanks to the digital revolution copious amounts of data related to diverse processes are being generated and accumulated. In the field of Data Science, analysis and drawing insights from the operational processes is of particular importance. Modelling the process allows us to perform conformance checks and even provide us with the capability to improve the processes. This kind of extraction of insights from event data is called Process Mining. In this article let’s dive deeper into the process mining techniques with python.

Process Mining

Process Mining is the amalgamation of computational intelligence, data mining and process management. It refers to the data-oriented analysis techniques used to draw insights into organizational processes. Following is a general framework of process mining.

Register for FREE Workshop on Data Engineering>>

Real-world events and business processes control the software systems and generate event logs. Each log corresponds to activity along with extra information such as timestamp, type, the context of the event etc. The availability of this kind of data is crucial for the application of Process Mining. A model is built on top of this data which can present the processes occurring in an actionable way.

Model Discovery

Process Mining consists of three main components: Model Discovery, Conformance checking and Model Enhancement. Discovery is the process of automatically generating a model from event logs that can explain the logs themselves without any prior knowledge. There are several algorithms that can be used for this discovery process. An Example Process Model generated by an automated platform

Conformance Checking

The second component of process mining is conformance checking. In this step,  we juxtapose the event logs with the process model of the same process. This reveals any non-conformances. Example: Transactions over 1 lakh rupees require the PAN card of the user. This constraint can be expressed by the process model. Then we can check all the event logs to make sure if this rule is followed.

Model Enhancement

In the third step, we use the process model that is discovered and the results of conformance checks to identify the process bottlenecks, circular loops and undesired aberrations in the processes. Equipped with this knowledge a new enhanced process is implemented and a target process model is built. This new process model is again enhanced using the same steps. Repeating these steps over and over results in the continuous improvement of organizational processes.

Pm4py

Setup

Pm4py is an open-source python library built by Fraunhofer Institute for Applied Information Technology to support Process Mining. Following is the command for installation.

!pip install -U pm4py

Data Loading

This library supports tabular data input like CSV with the help of pandas. But the recommended data format for event logs is XES(EXtensible Event Stream). This is an XML based hierarchical, tag-based log storage format prescribed by IEEE as a standard.

Let’s load some bank transaction logs stored in xes format. Data is downloaded from this website.

 from pm4py.objects.log.importer.xes import importer as xes_importer
 log = xes_importer.apply('/content/banktransfer(2000-all-noise).xes')
 If we prefer to use pandas to analyse the data we can convert the imported logs as follows.
 import pandas as pd
 from pm4py.objects.conversion.log import converter as log_converter
 df = log_converter.apply(log, variant=log_converter.Variants.TO_DATA_FRAME)
 df.to_csv('banktransfer')
 df 

We can see that the three most important attributes, case id, timestamp and name of the event are present. Let us reduce the number of rows by limiting the number of traces. This can be done by pm4py’s own suite of filtering functions.

See Also

 from pm4py.algo.filtering.log.timestamp import timestamp_filter
 filtered_log = timestamp_filter.filter_traces_contained(log, "2013-01-01 00:00:00", "2020-01-01 23:59:59") 

Model Discovery

PM4PY supports three formalisms that represent the process models: PetriNets(Place Transition Net), Directly Flow graphs and Process trees. We will confine ourselves to using Petrinets in this article. Following is the description of Petrinets published in the pm4py documentation.

Petrinets can be obtained using several different mining algorithms.We will use one such algorithm called alphaminer.

 from pm4py.algo.discovery.alpha import algorithm as alpha_miner
 net, initial_marking, final_marking = alpha_miner.apply(filtered_log) 

Visualizing a Petrinet

 from pm4py.visualization.petrinet import visualizer as pn_visualizer
 gviz = pn_visualizer.apply(net, initial_marking, final_marking)
 pn_visualizer.view(gviz) 

Conformance Checking

Following is an example code to perform conformance checking.We generate a model using a part of the log and then validate the entire log.

 from pm4py.algo.discovery.inductive import algorithm as inductive_miner
 from pm4py.algo.filtering.log.auto_filter.auto_filter import apply_auto_filter
 from pm4py.algo.conformance.tokenreplay.diagnostics import duration_diagnostics
 #Generating model using only a part of the log
 filtered_log = apply_auto_filter(log)
 net, initial_marking, final_marking = inductive_miner.apply(filtered_log)
 #Checking the entire log for conformance with the model
 from pm4py.algo.conformance.tokenreplay import algorithm as token_based_replay
 parameters_tbr = {token_based_replay.Variants.TOKEN_REPLAY.value.Parameters.DISABLE_VARIANTS: True, token_based_replay.Variants.TOKEN_REPLAY.value.Parameters.ENABLE_PLTR_FITNESS: True}
 replayed_traces, place_fitness, trans_fitness, unwanted_activities = token_based_replay.apply(log, net,
                                                                                               initial_marking,
                                                                                               final_marking,
                                                                                               parameters=parameters_tbr)
 #Displaying Diagnostics Information
 act_diagnostics = duration_diagnostics.diagnose_from_notexisting_activities(new_log, unwanted_activities)
 for act in act_diagnostics:
     print(act, act_diagnostics[act]) 

References

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top