Documents such as contracts, agreements, letters are the foundation of any modern business. They capture the trust between the transacting parties. Computers and the internet were the biggest change for business documents since the printing machine. Yet, since the advent of computers and the internet, all core ideas were based around folders and documents, concepts that mapped directly to the physical world when it came to computing these documents.
The process of drafting, maintaining, and storing these documents remained the same. The mobile and machine learning revolution is yet to change the document experience, specifically review and reading of documents. Reading these documents is a pain as these records are too long. Furthermore, any form of document review is restricted only to experts who just add to the costs and delays.
While there have been fixes, most available cloud-based editors just allow sharing and nothing more. In the end, these document computing systems have probably moved to another cloud server; the document review itself remains stuck in the dark ages.
Deconstructing documents with cognitive technologies
Cognitive technologies such as AI are all set to solve everyday business problems as they finally break the prevailing trade-offs between speed, cost, and quality. While computers still cannot replace humans, some technologies can automate tasks that previously required perpetual human skills such as planning, reasoning, and learning. This also means that AI can finally help people draft documents and unlock billions of dollars in increased efficiency, improved compliance, and business insights.
But while machine learning operates when the system is fed data to analyse, identify patterns, and make decisions with minimal human interventions, these datasets’ preparation is labour-intensive and expensive for most individual companies. What is required is a technology that identifies small datasets intrinsic to small businesses and their business functions within an organisation.
Revv – Tackling unstructured data with structured learning!
Automation opens up new frontiers for businesses. The augmentation technology available today has been able to pull and analyse data from text, image, video, numbers, and identify and augment the information – remodelling and learning as it develops.
The machine learning capability of Revv had enabled algorithms to be trained on large volumes of data and extract the data more efficiently. The tagging started with a small set of clauses and contracts, which was increased from 5000 to 25000 clauses by applying the ‘data augmentation’ technique. This tagging process took place in different stages and went through multiple iterations with hundreds of more contracts and clauses getting added to the dataset.
The confusion matrix was analysed from a prediction which gave a better view of tagging bias. The tagging then took place for 22 clause types after consulting with legal experts and going through the clause list published by IACCM in their most negotiated terms report.
Clauses and their count at the first iteration of tagging
Their very first deployable model had around 5000 rows of data and 22 classes to predict. All of the 22 classes were balanced with each other. The augmented dataset went from 500 rows to 25000 rows. CNN was used to create a model that achieved a precision of 0.94, recall of 0.97, and F1 of 0.96. The next step of the model was to identify the clauses that were associated with other clauses.
Clauses and their count at later stages of tagging
During the second training process, clauses like publicity and confidentiality; insurance and liability; renewal and termination; compliance and insurance etc. were combined to create more instances. This augmentation process increased the data size from 5000 to 25000. It helped solve the second part – able to train the dataset to identify clauses intrinsic to a particular business or business function within the organisation.
Scaling to success
Industries generate millions of documents every month as a byproduct of business processes. And each of these documents contains meaningful snippets of information that are hidden deep inside. Once the datasets are identified, collected, and cleansed, it was time to move to the next step. The next step includes providing meaning to text extracted from digital assets such as documents, text files, and scanned images and use these datasets to feed downstream business apps, set up workflows and optimise business processes.
All previous attempts of using AI to understand documents have failed because it focused on the co-occurrence of individual words and phrases existing in individual business documents. It was time to move beyond that by creating tools that could understand different portions of the document and their unique usage in any organisation. Therefore, the result is an intelligent model that can look for specific entities such as dates, contract numbers, purchase order numbers, etc. in different documents in minutes to generate meaningful insights and accelerate business outcomes.
Think itinerary processing, financial compliance, auditing, renewal follow-up, invoice processing, and so on, all reviewed, identified, and automated. Think of new possibilities and improved outcomes, for solopreneurs, consultants, lawyers, and entrepreneurs all from data identification and extraction from unstructured and structured business documents in minutes.
Join Our Discord Server. Be part of an engaging online community. Join Here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Rishi started his entrepreneurial journey ten years ago when he founded 1Click, which was acquired by Freshworks. He spent the next few years building and selling products at Freshworks. Besides managing the product and mentoring the young fledgeling startups, he has a ‘green thumb’ and is a health and fitness enthusiast.