Listen to this story
OpenAI has introduced ‘OpenAI Data Partnerships,’ inviting organisations to collaborate in producing both public and private datasets for training AI models. The initiative aims to enhance AI’s understanding of various subjects, industries, cultures, and languages, facilitating the development of AGI, as stated by OpenAI.
The ChatGPT creator is interested in large-scale datasets that reflect human society, encompassing various modalities such as text, images, audio, or video. The focus is on data expressing human intention, including long-form writing or conversations across different languages, topics, and formats.
The company has already partnered with several organisations to incorporate curated datasets into AI training. Notable collaborations include working with the Icelandic Government and Miðeind ehf to enhance GPT-4’s ability to comprehend Icelandic language.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Additionally, OpenAI joined forces with the non-profit organisation Free Law Project, incorporating their extensive collection of legal documents into AI training, aiming to democratize access to legal understanding.
To participate, OpenAI has offered two partnership options to organisations. The first involves creating an open-source dataset for training language models, promoting collaboration within the wider AI community. The second option allows organisations to contribute private datasets, ensuring the confidentiality of sensitive information while enabling OpenAI’s models to gain a deeper understanding of specific domains.
Interestingly, at the first-ever DevDay, OpenAI launched the Copyright Shield program, which aims to provide financial support and legal defense to the enterprise-level users of ChatGPT against such claims.
While unveiling the program, Sam Altman emphasised their efforts to ensure copyright compliance within their AI systems, which are trained on a combination of licensed and publicly available data sources.