Enterprise NLP : Scalable Knowledge Curation from Technical Documents

Date:

Tools: Python, REST APIs, Docker, Elastic Search , Mongo DB, Watson Discovery , Watson Assistant.

Methods:Several NLP techniques such as 

  • Extraction of Entities , Pivot sentences , Procedures , Symptoms, Conditional Blocks , Images and Tables. 
  • Improving search relevance using Boosting and Learn2rank 
  • Text classification using universal sentence encoder (USE) , Watson NLP

    Description: There is a wide variety of technical documents being generated in the IT industry. In order to build generic AI solutions faster that can cater to various kinds of enterprise IT scenarios, there is a need to automatically and generically extract the knowledge from technical documents. However challenges here include (a) the lack of standardization in size, style, and organization in the documentation (b) specific structural patterns present in IT documents that differentiate them from generic documents. To overcome these challenge of automated understanding of technical documents, we propose a framework to represent technical documents, inspired by how humans read and comprehend documents. We also detail on automatic extraction of components from technical documents, how they relate to each other and demonstrate the value by using two example applications: Search and Question Answering (QA) systems.​​


    ​​ Outcomes:
  • Product inclusion to IBM GBS Cognitive Assistant Platform
  • Deployed for over 300 clients
  • A - Level Research accomplishment with over 25M USD revenur per year
  • Two OTAA awards from IBM and Client

    Related papers:
  • CCS - https://arxiv.org/abs/1806.02284
  • QQI paper - https://ojs.aaai.org//index.php/AAAI/article/view/7024
  • BERT - https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/default/15848021.pdf
  • AAAI submission - Document anatomy framework

    Realted Patents :
  • Evaluating chat-bots w.r.to gap analysis 
  • Adding context to non-factoid questions using document structure