OpenAI Unveils Data Partnerships Program to Propel AGI Ambitions

OpenAI is interested datasets that reflect human society, encompassing various modalities such as text, images, audio, or video
Listen to this story

OpenAI has introduced ‘OpenAI Data Partnerships,’ inviting organisations to collaborate in producing both public and private datasets for training AI models. The initiative aims to enhance AI’s understanding of various subjects, industries, cultures, and languages, facilitating the development of AGI, as stated by OpenAI.

The ChatGPT creator is interested in large-scale datasets that reflect human society, encompassing various modalities such as text, images, audio, or video. The focus is on data expressing human intention, including long-form writing or conversations across different languages, topics, and formats.

The company has already partnered with several organisations to incorporate curated datasets into AI training. Notable collaborations include working with the Icelandic Government and Miðeind ehf to enhance GPT-4’s ability to comprehend Icelandic language. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Additionally, OpenAI joined forces with the non-profit organisation Free Law Project, incorporating their extensive collection of legal documents into AI training, aiming to democratize access to legal understanding.

To participate, OpenAI has offered two partnership options to organisations. The first involves creating an open-source dataset for training language models, promoting collaboration within the wider AI community. The second option allows organisations to contribute private datasets, ensuring the confidentiality of sensitive information while enabling OpenAI’s models to gain a deeper understanding of specific domains.

Interestingly, at the first-ever DevDay, OpenAI launched the Copyright Shield program, which aims to provide financial support and legal defense to the enterprise-level users of ChatGPT against such claims.

While unveiling the program, Sam Altman emphasised their efforts to ensure copyright compliance within their AI systems, which are trained on a combination of licensed and publicly available data sources.

Siddharth Jindal
Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox