process mining on documents and clickstreams

Date:

Tools: Python, REST APIs, Docker, networkx, pandas

Methods:Several NLP and graph techniques such as 

  • Extraction of Entities , decisions , OCR from documents and automated process model generation.
  • Page-rank algorithm to identify critical application
  • Heuristic automation amenable candidate discovery using KPIs such as Time duration , Execution frequency , Centrality etc.

    Description: Banking Client POC to identify automatically business process models from 1) unstructured documents with text ,images 2) banking applications user click data . This work helped client to better understand the process behind their Invoice processing , Fraudulent transaction detection systems by analysing the in-efficiencies , enhancements , conformance issues , application interaction , automation recommendations.
    1) Process mining on documents : Invoice processing process documents contain information in the form text descriptions and flow chart images. Our aim is to identify process elements from both text and image modalities , then combining results to generate discovered. Process model in BPMN format.We used pre-trained model for learning structure of the document using CCS (Corpus conversion service) and CCS tool can be customised to documents of different styles.
    i)Text2process: Extraction of task nodes , activity description, decision nodes of the process from text
    ii) Image2process: Extraction of task nodes from flow diagram using CCS OCR
    iii) Discovered process BPMN: Automated BPMN process model generation combining Text2process and Image2process.

    2) Process mining on application click data : Fraudulent transaction process contains different applications with users accessing them.Click streams data modality captures user activity at application level. We use click data to discover process using Record2process to represent interaction between different applications at the window level .  i)Record2process:
    a) Main process at the application level to show the user control flow between different applications
    Sub-process for each application at window level to understand the user interaction within each application
    ii) Cognitive load estimation :
    a) For each application , human cognitive effort required is estimated based on user actions
    iii) Automation amenability:
    a) For each application , automation feasibility is measured based on determinism w.r.to time duration spent in the application. 

    ​​ Related papers:
  • CCS - https://arxiv.org/abs/1806.02284

    Realted Patents :
  • Discovering automation amenable candidates