Active Hackathon

Data Extraction Just Got Smarter With ML: AWS Announces Textract

Amazon Web Services, the cloud computing arm of the e-commerce giant, recently launched an ML service for automated text and data extraction. The service, known as Textract, is fully cloud-hosted and managed by AWS, and allows users to parse various forms of data easily.

The service is said to be more than just an optical character recognition algorithm, as it can parse data tables, whole pages, forms, scans, PDFs, photos and more. Moreover, it also identifies fields and tables, so as to contextualize the data and allow for the collection of cleaner datasets with deeper insights.


Sign up for your weekly dose of what's up in emerging technology.

The company states that it can process millions of document pages “accurately” in just a few hours. All the data is exported to a JSON format, and can integrate easily with other ML-based AWS services. What sets this product apart is that there is no need to maintain any code or template, and that there is no ML experience required to operate or manage the product.

Amazon states that they have trained Textract on “tens of millions of documents from virtually every industry”, making it suitable for use in any scenario. It can “automatically detect a document’s layout”, preserving the key elements in the page and perform optimal data collection by understanding the relationships between the data.

Amazon is billing it as a lower-cost alternative to manual data entry, with an ease-of-use benefits. Moreover, as with every cloud computing service, it is provided on a pay-as-you-go basis, with accessible APIs. Swami Sivasubramanian, Vice President, Amazon Machine Learning, stated:

“Amazon Textract makes it possible for customers to gain real meaning from their file collections, operate more efficiently, improve security compliance, automate data entry, and facilitate faster business decisions.”

Currently, the service is available in US East (Ohio), US East (N. Virginia), US West (Oregon), EU (Ireland), with Amazon stating that further expansion will happen within the year.

Many prominent companies have already begun using the service, such as The Globe and Mail, a Canadian media outlet, Met Office, the UK’s national weather service  and PriceWaterhouseCoopers, one of the world’s biggest accounting firms. The rise of accessible data entry ML models might be the beginning of the end for low-level jobs such as data entry.

More Great AIM Stories

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM