
Data requirements, challenges, and solutions for a having a successful process mining in organizations
Amin Amirkhalili
1- Introduction (nature of the topic)
This essay examines the concept of process mining from a data-centric perspective, considering certain types of data as the foundational raw material for process mining within organizations. Event logs are a primary type of data frequently utilized in process mining, which this essay will explore in detail. For two decades, some organizations have engaged in process mining to enhance their performance, using various data types to uncover actual business processes. Despite the clear benefits of process mining for both practitioners and researchers, a limited number of companies have adopted this analytical approach for their processes. This essay aims to explore potential reasons for such reluctance, whether they arise from external factors like competition or rapidly changing economic and social environments, or from internal challenges such as limited resources, resistance to change, or a lack of knowledge. Among the factors contributing to managers’ lack of interest in embracing a data mining approach, issues related to data itself or its management and processing could also be at fault. As noted by González et al. (2019), the collection and utilization of event logs in real-life scenarios are far from straightforward, with the retrieval and proper formatting of data from storage systems for real-time use presenting considerable complexity. In essence, the data issue pertains to acquiring, transforming, organizing, and extracting valuable information from databases, which is unachievable without some form of data integration within organizations (Numminen, 2023). By identifying these barriers—referred to as data mining killers—a framework will be proposed for those interested in more effectively implementing process mining in their organizations. Thus, one objective of this essay is to present a comprehensive framework that encapsulates all process mining concerns and necessities as influenced by both internal and external limitations.
The subsequent section will define process mining and its attributes, along with a concise history of its evolution and its current standing among researchers and practitioners in the academic and business communities. Following this, the essay will discuss the data requirements for process mining and organizational constraints that inhibit the use of data mining, concluding with a set of guidelines to facilitate better application within organizations. Finally, the essay will briefly address the future challenges and opportunities, as well as the potential scope of this concept.
2- Definition, history, and current situation of Process Mining
Process mining is a method used to gain a true understanding of business processes through the analysis of operational data from enterprise applications (Lawton, 2023). At its core, it entails scrutinizing data with a focus on the process. While some research expands the definition of process mining beyond a mere data-driven technique to include an entire field encompassing tools, techniques, and methodologies (Emamjome et al., 2019), this essay treats both perspectives equivalently for the sake of a broad overview, thus preserving the essay’s coherence.
Determining the appropriateness of process mining for an organization and its leadership involves pondering questions such as: What is the current state of the actual process? Are there redundant and inefficient steps that could be eliminated?, Where do bottlenecks occur?, and “Are there deviations from established rules and designated processes? While these queries have traditionally been addressed by business process engineering and reengineering efforts, the advent of process mining—bolstered by data mining techniques and the utilization of information systems byproducts, often referred to as “event logs”—provides fresh perspectives on age-old questions. This aligns with the principles of design science research, where creating artifacts aims to offer novel solutions to pre-existing problems, as illustrated by the shift from routine designs to enhancement designs in the diagram provided (Gregor & Hevner, 2013).
Fig1. Design science Research Knowledge contribution framework (Gregor & Hevner, 2013)
Indeed, the presence of processes within an organization does not automatically equate to improved operational or organizational performance. Rather, processes can be viewed as instrumental in achieving such improvements. Similarly, process mining in itself does not inherently generate value. Its true worth is realized when its outputs enable managers and process experts to conduct analyses that lead to actionable insights. These analyses may involve comparing actual process execution against a predefined model or identifying inefficiencies, such as bottlenecks, that contribute to longer operation times, reduced quality of products or services, or the inability of involved parties to achieve collective goals.
In practice, the objectives of process mining can often be categorized into three main types: conformance checking, process enhancement, and process discovery. Conformance checking involves verifying if the reality, as recorded by event logs, conforms to the designed process model. Process enhancement refers to the improvement of an existing process by analyzing and optimizing the actual process based on the information gathered from process mining. Process discovery is the technique of creating a process model from scratch based on the available event log data, with no a priori process model (Numminen, 2023). These three facets underscore the multifaceted utility of process mining in achieving operational excellence and organizational synergy.
2.1. Four Types of Process Mining
-
- Process discovery
Process discovery is a primary application of process mining in organizations, aimed at uncovering processes through the analysis of enterprise applications like ERP and CRM systems. The findings may align with existing knowledge of processes or reveal entirely new insights. Typically, outcomes are visualized in formats such as BPMN diagrams or Petri nets.
-
- Conformance checking
Conformance checking is another application of process mining, which involves comparing the discovered process models from event logs with pre-existing process models. This requires two steps: first, running process mining to discover the actual processes, and second, having a reference model that represents the intended process.
-
- Process enhancement
Process enhancement goes further, using process mining to analyze various process performance metrics, including cycle times, waiting times, and processing times. Additional data points recorded in the event logs, such as activity costs and quality or quantity specifications of products or services, as well as feedback from customers or other stakeholders, can provide a comprehensive performance analysis. Process mining can also detect outliers, which is critical for identifying and addressing issues like fraud or system misuse.
-
- Task mining
Task mining, often overlooked, extends the capabilities of process mining to analyze human-computer interactions. Combining process mining data with insights from other techniques, such as eye-tracking systems or natural language processing, can lead to improvements in user experience and system design.
Fig 2. The three basic types of process mining explained in terms of input and outpu (Aalst et al. ,2011)
2.2. Event logs
Event logs serve as the foundational element in process mining, capturing a sequence of system data chronologically. They predominantly consist of events or activities, each associated with a specific process instance, often referred to as case IDs, which are organized according to their timestamps. This structured data enables the analysis of how processes unfold over time within a system.
2.3. History of process mining
Tracing the academic discourse on the interplay between business processes and information systems reveals three distinct periods, from the inception of information systems in management and business scholarship to the present day. Prior to the integration of information systems, business processes and models were predominantly crafted and executed manually. The primary aim during this period was to devise business conceptual models characterized by a high degree of comprehensiveness, popularity, and validity.
Following the introduction of information systems, the discourse shifted towards exploring how these systems could automate business processes. Yet, as the new millennium dawned, marking an era increasingly dominated by data, the focus pivoted towards data management and subsequently, data analytics. Information systems began to be viewed not merely as tools for automation but as instruments for business integration and value creation. In the past two decades, the study of business processes has increasingly adopted a data-centric lens, leading to the development of new generations of applications designed to assist business analysts in designing and analyzing processes through data derived from enterprise applications.
Process mining has evolved significantly throughout this period. Early iterations of process mining technology were limited to analyzing a single process at a time. Recent advancements, however, have expanded the capabilities of process mining, enabling the examination of multiple processes and their interconnections simultaneously.
2.4. Process mining Techniques
Process mining primarily employs statistical methodologies alongside various techniques like machine learning, data mining, and predictive analytics to extract process patterns from data. Aalst et al. (2011) have laid out guidelines that not only advocate for the development and standardization of process mining techniques but also serve as a comprehensive guide for both practitioners and researchers in the field. This call to action underscores the importance of refining process mining methodologies to enhance their applicability and effectiveness in uncovering valuable insights from process data.
2.5. Process Mining methodology (How to implement process mining)
Based on the process mining life cycle as outlined by Aalst et al. (2011) and expanded by Emamjome et al. (2019), the methodology comprises several key phases, beginning from initial planning to operational support, with added emphasis on defining research questions, stakeholder evaluation, and implementation for comprehensive process enhancement. Here’s an overview while maintaining the format of phases 0 to 4:
Phase 0 – Plan and Justify: This foundational phase emphasizes the importance of thoroughly planning the process mining project and justifying its necessity. It involves defining the scope, objectives, and desired outcomes of the project, crucial for securing stakeholder support and resources. Identifying the processes for analysis, the data sources, and the key performance indicators (KPIs) to be targeted sets the stage for the subsequent phases.
Phase 1 – Extract: Data extraction is pivotal, focusing on gathering relevant data from information systems, primarily through event logs that capture the execution of processes. Ensuring the accuracy, completeness, and analysis-ready format of this data is critical for a truthful representation of the actual processes.
Phase 2 – Create a Control-Flow Model and Connect it to the Event Log: In this phase, a control-flow model is developed from the extracted event logs, illustrating the sequence and interconnection of process activities. Validating this model against real process data aids in identifying process deviations, bottlenecks, and inefficiencies.
Phase 3 – Create an Integrated Process Model: Building on the control-flow model, this stage aims to develop an integrated process model that includes various perspectives such as resources, data flows, and organizational structures, thereby providing a holistic understanding of the process and enabling informed decision-making.
Phase 4 – Provide Operational Support: The final phase focuses on leveraging the insights obtained from process mining to enhance operational support. This may encompass recommendations for process improvements, predictive analytics for forecasting process behaviors, and decision support systems to manage process executions more effectively, ultimately aiming to boost operational efficiency and effectiveness.
Emamjome et al. (2019) enhance the process mining methodology by adding critical elements that further its application and effectiveness. They stress the importance of starting with clearly defined research questions to direct the scope and objectives of the process mining project. This early clarity is crucial for ensuring that the project addresses the most pertinent issues. Additionally, they introduce a phase dedicated to stakeholder evaluation, advocating for the inclusion of feedback loops to assess the validity, accuracy, and relevance of process mining findings from those impacted by or involved in the processes. This step is vital for aligning the project’s outcomes with organizational goals and stakeholder expectations. Moreover, they underscore the significance of an implementation phase, where the insights and recommendations derived from process mining are put into action. This not only realizes the potential improvements identified through the analysis but also solidifies process mining’s role in driving process enhancement. Together, these additions by Emamjome et al.(2019) provide a comprehensive framework that extends from the initial planning stages to the practical plementation of findings, ensuring that process mining projects are both strategic in their conception and actionable in their execution
Fig 3. Process Mining Methodology (Aalst et al. , 2011; Emamjome et al. ,2019)
2.6. Process mining application in various domain
he graph below shows the distribution of articles reviewed by Emamjome et al. (2019) across various domains.
Fig 4. Frequency of Research of process mining in Various domain
It is observed that healthcare and education emerged as two of the most prevalent domains for researchers to undertake their process mining projects.
2.7. Benefits and drawbacks of process mining
One significant advantage of process mining, as highlighted by Emamjome et al. (2019), is its ability to provide an authentic depiction of the actual sequence of activities within organizations. This data-driven method potentially offers superior insights compared to traditional process analyses, which often depend more on conceptual understandings than on empirical evidence.
3- Data requirements and company limitations
3.1. Structured data
The foundational element of process mining is event logs, categorized as structured data. This category encompasses various data forms, including arrays, records, sets, and files, with event logs uniquely designed for the purpose of logging, monitoring, and analyzing system or process events ( Sorber, Barel, & Lathauwer, 2015). Essential components of event logs, such as activities, case IDs, and timestamps, form a highly structured and predictably organized dataset that facilitates efficient data processing and analysis by computers, applications, and systems (Bayomie et al., 2015). This organization greatly simplifies complex operations, such as searches and analytical processes, that might pose challenges with unstructured data due to the need for intricate processing techniques to derive meaningful insights. In essence, structured data’s hallmark is its compatibility with systems, making it directly comprehensible. This presents a challenge for organizations in competitive fields striving to leverage unstructured data, seen as a critical resource today (Doan et al., 2009). However, for process mining, the emphasis is on accessing and utilizing structured data, predominantly event logs. This necessitates the comprehensive management of structured data, encompassing creation or collection, purification, storage, and dissemination.
The availability of structured data, especially event logs, signifies that an organization can embark on process mining. This indicates that possessing sophisticated information systems or data mining tools is not a prerequisite for obtaining event logs. Given that process mining fundamentally relies on analyzing logs, organizations are required to furnish structured data that inherently includes at least three critical components: Case IDs, Activities, and Timestamps.
3.2. Data Integration
Data integration is essential for the success of process mining. Without effectively combining data from diverse sources, the ability of process mining to improve business processes could be significantly diminished. Data integration serves as a critical element in ensuring the seamless alignment of an organization’s information systems. This alignment often represents a principal goal within IT projects, such as IT Master Plans or Enterprise Architecture initiatives, which strive to harmonize business operations with the IT infrastructure, making data integration a key deliverable.
The essence of this concept is to ensure a tight integration between business operations and IT infrastructure, as illustrated in the figure below. Van der Aalst and Weijters (2004) discuss how process mining utilizes data from event logs to decipher actual business processes, underscoring the importance of having unified data for an exhaustive analysis of operational processes.
Fig 5. Organizational design and information systems design activities (Hvner et al., 2004)
Azevedo (2014) explores the pivotal role of integrating data mining into business intelligence systems, emphasizing its necessity for enabling managers to make well-informed decisions grounded in a comprehensive understanding of the business landscape. This integration is instrumental in fostering a competitive edge by offering deeper insights into operational dynamics.
Similarly, Homayounfar (2012) delves into the significance of process mining within the realm of hospital information systems, showcasing how crucial data integration is for the enhancement of healthcare processes. The success of process mining is closely linked to the meticulous organization and amalgamation of data, highlighting the broader relevance of data integration in healthcare contexts.
Wegener and Rüping (2010) further discuss the criticality of embedding data mining within business processes. They pinpoint challenges associated with the integration of complex data mining services into business process management frameworks, such as clarifying role definitions and ensuring the adaptability of data mining services. This underlines the intricate relationship between data integration and the refinement of business processes.
However, it’s important to recognize that data integration extends beyond the confines of information system integration. While system integration is a key aspect, it’s not always feasible, prompting the exploration of alternative techniques like cloud computing, ETL processes, and API integrations. Crucially, companies must first assess their business requirements and gain a deep understanding of both their internal and external environments before proceeding with data integration efforts.
3.3. Data quality control
Utilizing process mining tools and techniques on subpar data can present significant risks, potentially causing more harm than incorrect process implementation itself (Wynn & Sadiq, 2019). Misimplemented processes usually reveal their flaws promptly, whereas a process mining model operating on flawed data might not readily exhibit failure signs. This obscurity complicates problem identification, particularly in the immediate term, and becomes more problematic when such data informs operational or strategic decisions. The quest for high-quality, comprehensive data from diverse sources within intricate enterprise systems represents a formidable barrier for organizations seeking to exploit process mining effectively (van der Aalst & Weijters, 2004). Challenges such as navigating complex process constructs, handling temporal discrepancies, noise, partial logs, and events at varying granularities demand robust techniques to ensure data integrity. Critical measures include data cleansing to mitigate noise, anomalies, and gaps; data authentication to confirm data fidelity; and data validation to comply with established formats and standards. These practices not only enhance data quality but also facilitate smoother data integration, potentially easing the burdens of integration in terms of time and resources.
3.4. Data compliance and security
Accessing and sharing data inherently pose IT and strategic risks, necessitating a balance between stringent data control, with restricted access, and the efficiency of data transmission and process analysis. The dissemination and utilization of event logs, which encompass critical data, also entail potential IT hazards. Specifically, if information from event logs, including activities, their sequences, durations, and associated parties, falls into the wrong hands, whether through intentional or accidental disclosure, companies may incur substantial financial and reputational damages. Consequently, it is imperative to implement rigorous security and compliance measures, such as data encryption and data masking. Data encryption employs cryptographic methods to safeguard data from unauthorized access, thereby preserving its confidentiality. On the other hand, data masking involves concealing sensitive information, ensuring the data’s privacy is maintained throughout the process mining endeavors (Numminen, 2023).
3.5. Company factors: Employee Skills and Limited sources
In the realm of process mining, organizational elements like employee expertise and budget limitations can markedly impact the successful deployment of such initiatives within a company. Employee proficiency in leveraging process mining tools is essential, influencing the caliber of insights obtained and subsequent business decisions. Similarly, constrained budgets may hinder the adoption of sophisticated process mining methods, which typically demand significant investments in technology and personnel training. These constraints underscore the need for meticulous planning and prioritization to utilize process mining effectively for organizational enhancements. Nemati and Barko (2003) highlighted implementation factors crucial to the success of data mining projects, akin to process mining concerning data use and analysis. These factors include allocating ample resources and fostering employee skills to manage intricate data-driven endeavors. Grisold et al. (2020) delve into the operational challenges that process managers encounter in implementing and overseeing process mining, pinpointing phases where issues like planning, process choice, and resource distribution emerge. They propose that existing frameworks for business process management may require expansion to adequately confront these obstacles. These investigations emphasize the critical role of both human and financial resources in orchestrating process mining projects.
4. Discussion and future opportunity
After two decades of research in the field of process mining, some scholars, including Emamjome et al. (2019), argue that the discipline has reached a level of maturity. This stage marks an opportunity for researchers to introduce new techniques and solutions that could enhance the efficacy of process mining activities within organizations. Furthermore, the issue of unstructured data, a critical asset for many organizations and anticipated to become one of the future’s principal resources, presents a challenge. Given that event logs are considered structured data, organizations aiming to implement process mining cannot solely depend on unstructured data; they need to strike a balance between their structured and unstructured data repositories.
The ongoing global economic downturn, yet to recover from the Corona pandemic, may render data analytic practices such as process mining a luxury for organizations unless business analysts and researchers can convincingly demonstrate its value to all stakeholders. Research by Emamjome et al. (2019) suggests that recent studies on process mining have somewhat deviated from adhering to a rigorous methodology. By faithfully applying a comprehensive process mining methodology, particularly in stages that involve soliciting stakeholder feedback, the significance of process mining in contemporary organizations can be reaffirmed and valued more than ever.
The essay concludes by proposing a framework that synthesizes the discussions into a cohesive strategy for companies to navigate their diverse objectives in employing process mining. These guidelines span various levels, from the environmental context and the process mining discipline itself to broader business considerations, data management, and system infrastructure, providing a holistic approach for organizations embarking on process mining ventures.
Table1. Framework of multidimensional requirement for process mining purposes
Levels | Process Recovery | Conformance checking | Process enhancement | Task mining |
Environment | fair skill needed for employees Fair budget | High skill needed for employees Moderate budget | High skill needed for employees High budget | High skill needed for employees Fair budget |
Process mining | – | Process Recovery | Process Recovery conformance checking | – |
Business | – | Designed Process | Designed Process Designed Process Model | – |
Data | Event Logs Integrated data Fai quality data Fair security control | Event Logs Integrated data Moderate quality data Fair security control | Event Logs Integrated data High quality data High security control | Event Logs High quality data High security control |
References:
-
- Azevedo, A., & Santos, M. F. (2014). Integration of Data Mining in Business Intelligence Systems (1st ed.). IGI Global.
-
- Nemati, H. R., & Barko, C. D. (2003). Key factors for achieving organizational data-mining success. Industrial Management & Data Systems, 103(4), 282-292.
-
- Esser and Fahland (2019): Forinstance, Esser and Fahland (2019) discuss using graph data structures for event logs to enable complex queries and analyses, showcasing the structured nature of event logs in facilitating process mining.
-
- Grisold, T., Mendling, J., Otto, M., & vom Brocke, J. (2020). Adoption, use and management of process mining in practice. Business Process Management Journal, 27(2), 369-387.
-
- Rozinat, A. (2012, 03 February). Data Requirements for Process Mining. Fluxicon. https://fluxicon.com/blog/2012/02/data-requirements-for-process-mining
-
- Lawton, G. (2023, November). process mining. TechTarget. https://www.techtarget.com/searchcio/definition/process-mining
-
- Numminen, L. (2023). Data Requirements for Process Mining. Workflow. https://www.workfellow.ai/learn/data-requirements-for-process-mining
-
- van der Aalst, W. M. P., & Weijters, A. J. M. M. (2004). Process mining: A research agenda. Computers in Industry, 53(3), 231-244
-
- van der Aalst, W. M. P., Adriansyah, A., de Medeiros, A. K. A., Arcieri, F., Baier, T., Blickle, T., … & Wynn, M. T. (2011). Process mining manifesto. In Business Process Management Workshops (pp. 169-194). Springer, Berlin, Heidelberg.
-
- Emamjome, F. Z., Andrews, R., & ter Hofstede, A. H. M. (2019). A case study lens on process mining in practice. On the Move to Meaningful Internet Systems: OTM 2019 Conferences (pp. 167-186). Springer, Cham.
-
- Bayomie, D., Helal, I. M. A., Awad, A., Ezat, E., & Bastawissi, A. (2015). Deducing Case IDs for Unlabeled Event Logs.
-
- Doan, A., Naughton, J., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B. J., Gokhale, C. S., Huang, J., Shen, W., & Vuong, B.-Q. (2009). Information extraction challenges in managing unstructured data. SIGMOD Rec., 37, 14-20.
-
- Gregor, S and A.R. Hevner (2013). Positioning and Presenting Design Science Research for Maximum Impact. MIS Quarterly, 37(2), 337-356.
-
- Hevner, A.R., March, S.T., Park, J. and Ram, S. (2004). Design Science in Information Systems Research. MIS Quarterly, 28(1), 75-105.
-
- Homayounfar, P. (2012). Process mining challenges in hospital information systems. 2012 Federated Conference on Computer Science and Information Systems (FedCSIS), 1135–1140.
-
- González López de Murillas, E., Reijers, H. A., & van der Aalst, W. M. P. (2019). Connecting databases with process mining: a meta model and toolset. Software and Systems Modeling, 18(2), 1209–1247
-
- Sorber, L., Van Barel, M., & De Lathauwer, L. (2015). Structured Data Fusion. IEEE Journal of Selected Topics in Signal Processing, 9(4), 586–600.
-
- Wegener, D., & Rüping, S. (2010). On Integrating Data Mining into Business Processes. In Business Information Systems (Vol. 47, pp. 183–194).