MITB Banner

JPMorgan Announces DocLLM for Multimodal Document Understanding

For pre-training DocLLM, data was gathered from two primary sources: IIT-CDIP Test Collection 1.0 and DocBank.

Share

JPMorgan Announces DocLLM for Multimodal Document Understanding

JPMorgan has introduced DocLLM, a generative language model designed for multimodal document understanding. DocLLM stands out as a lightweight extension to LLMs for analysing enterprise documents, spanning forms, invoices, reports, contracts that carry intricate semantics at the intersection of textual and spatial modalities. 

Click here to read the paper.

Unlike existing multimodal LLMs, DocLLM strategically avoids expensive image encoders and focuses exclusively on bounding box information to incorporate spatial layout structures. The model introduces a disentangled spatial attention mechanism by decomposing the attention mechanism in classical transformers into a set of disentangled matrices.

DocLLM tackles irregular layouts and heterogeneous content in visual documents by employing a pre-training objective that focuses on learning to infill text segments.

The model features a disentangled spatial attention mechanism facilitating cross-alignment between text and layout modalities, an infilling pre-training objective adept at handling irregular layouts effectively.

For pre-training DocLLM, data was gathered from two primary sources: IIT-CDIP Test Collection 1.0 and DocBank. The former comprises over 5 million documents related to legal proceedings against the tobacco industry during the 1990s, while the latter consists of 500,000 documents, each featuring distinct layouts.

Extensive evaluation across various document intelligence tasks demonstrates DocLLM’s superiority over state-of-the-art LLMs. The model outperforms equivalent models on 14 out of 16 known datasets and exhibits robust generalisation to previously unseen datasets in 4 out of 5 settings.

Looking ahead, JPMorgan expresses its commitment to infusing vision into DocLLM in a lightweight manner, further enhancing its capabilities.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.