A large volume of information travels through papers in an organisation, and knowing the structure of documents enables the extraction of relevant and useful data. Documents can be in a variety of forms and types, including native PDFs, web pages and scanned images. They also have a variety of templates, making document processing and interpretation a chore. Financial papers, in particular, such as audited reports, bank statements, financial reports, exchange filings, and so on, are critical documents that must be assessed for a variety of compliance and risk-related applications, including underwriting, risk rating, and more.
At the DLDC 2021 organised by the Association of Data Scientists (ADaSci) — Rahul Ghosh, VP of AI Research and Services at American Express AI Labs, spoke on AI-Powered Document Intelligence for Enterprises. Additionally, Rahul also gave a sneak peek into the R&D efforts at American Express and demonstrated how Document AI-enabled products could drive innovation and efficiency at scale.
To start with, document intelligence is nothing but the way that allows us to tap into the opportunities offered by unstructured document data and unlock the potential for faster and more informed decisions, increase operational efficiency, data governance, integrity and compliance, and enhance customer experience. Next, Rahul presented a general document AI stack for an enterprise. The stack is shown below.
This was followed by a simple explanation of the different types of documents, which includes:
- Form type documents are short documents. Typically, they are like one to two, or less than five pages, and have a very well-defined structure and a layout present —for example, invoices or bank statements. Use cases can be understanding invoice spend patterns, getting cash flow insights from bank statements, etc.
- Verbose documents are longer and have a lot more information put together in a single place. For example, a construction contract agreement – where you have data in the form of text, images, tables and more. Marketing creatives can review use cases before the campaign launch, highlight key clauses from documents, etc.
Talking about the challenges in information extraction from verbose documents, Rahul discussed a research paper for the case. A research paper consists of certain text, tables, titles, etc. Moreover, tables come with different types of cell formats, so, as the paper changes, the content will change completely. Hence, a simple rule-based or template-based approach will fail to extract table types that vary across different documents. There are few extraction challenges attached in form type documents, which includes:
- Form type documents also have diverse templates.
- Rule-based approaches can’t handle unseen templates and are difficult to manage.
- NLP-based approaches assign tags to each portion of the text, while CNN-based approaches can capture irrespective of variations in templates.
- Both NLP/CNN approaches have limitations for the cases where information is embedded in the spatial arrangement of the layout, not the text itself.
A typical extraction pipeline for extracting information out of documents is shown below.
Rahul said, “RCNN is an approach for extracting information from images and other things that are there. Although the approach was good, it was computationally expensive and had a longer training time. To remove this bottleneck, Microsoft research came up with a fast RCNN. With the fast RCNN, the problem was that at the testing time, you needed to explicitly tell where the objects were located. Then, somebody had to manually feed that information. That itself is a bottleneck.”
On top of faster RCNN, there are some improvement areas that can be taken into account. The addition of iterative refinement helps in identifying the location of objects. Even after this, there are more complicated document use cases where even with iterative refinement, one may not be able to capture all the details accurately. The results obtained by the American Express team are shown below.
Documents are the key source of unstructured data for an enterprise. AI-powered document intelligence can understand the structure of a document and extract contents, leading to significant process efficiency. In addition, deep learning approaches borrowed from image processing, computer vision and other related disciplines can significantly outperform naive rule-based approaches.