Blog/news

OpenAI says ChatGPT Would Be Impossible without Copyrighted Content

Jan 09, 2024 2 min read

OpenAI admits the integral role of copyrighted content in ChatGPT's innovation.

OpenAI, the creator of ChatGPT, has stated that training AI without copyrighted material would be impossible, given the increasing pressure on artificial intelligence firms regarding the content used for training their products. AI models such as ChatGPT and the image generator DALL-E gain their abilities through training sessions that include large amounts of content scraped from the Internet without the permission of their rights holders. This type of free-for-all scraping has been prevalent in academic machine learning research, but since deep learning AI models became commercially available, the practice has come under intense scrutiny.

"Because copyright today covers virtually every sort of human expression—including blog posts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today's leading AI models without using copyrighted materials," OpenAI said in its submission to the House of Lords.

Furthermore, OpenAI claims that limiting training data to public domain books and drawings "created more than a century ago" will result in AI systems that do not "meet the needs of today's citizens."
This statement comes after The New York Times filed a lawsuit against OpenAI and Microsoft, a major investor in the company, last month for allegedly using the newspaper's content illegally in their products. The 69-page lawsuit alleges that OpenAI illegally used the New York Times' work to develop AI systems that compete with media companies.

The lawsuit claims that OpenAI's tools produce "output that recites Times content verbatim, closely summarizes it, and mimics its expressive style," as evidenced by scores of examples.

Getty Images, which owns one of the world's largest photo libraries, is suing the creator of Stable Diffusion, Stability AI, in both the United States and England and Wales for suspected copyright violations. In the United States, a group of music publishers, including Universal Music, are suing Anthropic, the Amazon-backed company behind the Claude chatbot, alleging it of using "innumerable" copyrighted song lyrics to train its model.

If you want to know more about the case, read the entire submission here.

 


Resources: Image courtsey - Mariia Shalabaieva & Levart_Photographer

Tags: chatGPTcopyrightsOpenAI

Share this:

This site or third-party tools used by this make use of cookies necessary for the operation and useful for the purposes outlined in the cookie policy. By accepting, you consent to the use of cookies and you agree to terms & conditions.