Data Extraction Just Got Smarter With ML: AWS Announces Textract

Amazon Web Services, the cloud computing arm of the e-commerce giant, recently launched an ML service for automated text and data extraction. The service, known as Textract, is fully cloud-hosted and managed by AWS, and allows users to parse various forms of data easily.

The service is said to be more than just an optical character recognition algorithm, as it can parse data tables, whole pages, forms, scans, PDFs, photos and more. Moreover, it also identifies fields and tables, so as to contextualize the data and allow for the collection of cleaner datasets with deeper insights.

The company states that it can process millions of document pages “accurately” in just a few hours. All the data is exported to a JSON format, and can integrate easily with other ML-based AWS services. What sets this product apart is that there is no need to maintain any code or template, and that there is no ML experience required to operate or manage the product.

Amazon states that they have trained Textract on “tens of millions of documents from virtually every industry”, making it suitable for use in any scenario. It can “automatically detect a document’s layout”, preserving the key elements in the page and perform optimal data collection by understanding the relationships between the data.

Amazon is billing it as a lower-cost alternative to manual data entry, with an ease-of-use benefits. Moreover, as with every cloud computing service, it is provided on a pay-as-you-go basis, with accessible APIs. Swami Sivasubramanian, Vice President, Amazon Machine Learning, stated:

“Amazon Textract makes it possible for customers to gain real meaning from their file collections, operate more efficiently, improve security compliance, automate data entry, and facilitate faster business decisions.”

Currently, the service is available in US East (Ohio), US East (N. Virginia), US West (Oregon), EU (Ireland), with Amazon stating that further expansion will happen within the year.

Many prominent companies have already begun using the service, such as The Globe and Mail, a Canadian media outlet, Met Office, the UK’s national weather service  and PriceWaterhouseCoopers, one of the world’s biggest accounting firms. The rise of accessible data entry ML models might be the beginning of the end for low-level jobs such as data entry.

Download our Mobile App

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.