OpenAI seeks partnerships to generate AI training data

(Reuters) – ChatGPT maker OpenAI said on Thursday it intends to work with organizations to produce public and private datasets for training artificial intelligence (AI) models.

Popular chatbot ChatGPT, which can generate poems and prose from simple prompts, is based on large language models that are trained entirely on open-source data available on the Internet.

The company’s latest effort could help it produce more nuanced training data that are more conversational in style.

“We’re particularly looking for data that expresses human intention, across any language, topic and format,” the company said in a blog post.

OpenAI said it is seeking partners to help it create an open-source dataset for training language models. This dataset would be public for anyone to use in AI model training, it said.

The company said it is also preparing private datasets for training proprietary AI models.

(Reporting by Jaspreet Singh in Bengaluru; Editing by Shilpi Majumdar)