Listen to this story
|
Andrew Ng has rolled out a new course called “Preprocessing Unstructured Data for LLM Applications,” this time in collaboration with San Francisco-based startup Unstructured. Unstructured essentially captures unstructured data wherever it is stored and transforms it into AI-friendly JSON files for companies eager to incorporate AI into their business.
Taught by Matt Robinson, head of product at Unstructured, it’s free for a limited time and takes about an hour to complete.
You’ll learn to extract and standardise content from various document types, such as PDFs, PowerPoints, Word, and HTML files, as well as tables and images into a common JSON format. This will broaden the range of information available for your LLM applications. Enriching your content with metadata will improve retrieval augmented generation (RAG) results and enable more nuanced search capabilities.
The course covers techniques for document image analysis, including layout detection and vision and table transformers. You’ll discover how to apply these methods to preprocess PDFs, images, and tables. It is suitable for anyone interested in effectively processing diverse data types and formats to build high-performing LLM RAG systems.