For the past few months, Morten Blichfeldt Andersen has spent many hours scouring OpenAI’s GPT Store. Since it launched in January, the marketplace for bespoke bots has filled up with a deep bench of useful and sometimes quirky AI tools. Cartoon generators spin up New Yorker–style illustrations and vivid anime stills. Programming and writing assistants offer shortcuts for crafting code and prose. There’s also a color analysis bot, a spider identifier, and a dating coach called RizzGPT. Yet Blichfeldt Andersen is hunting only for one very specific type of bot: Those built on his employer’s copyright-protected textbooks without permission.
Blichfeldt Andersen is publishing director at Praxis, a Danish textbook purveyor. The company has been embracing AI and created its own custom chatbots. But it is currently engaged in a game of whack-a-mole in the GPT Store, and Blichfeldt Andersen is the man holding the mallet.
“I’ve been personally searching for infringements and reporting them,” Blichfeldt Andersen says. “They just keep coming up.” He suspects the culprits are primarily young people uploading material from textbooks to create custom bots to share with classmates—and that he has uncovered only a tiny fraction of the infringing bots in the GPT Store. “Tip of the iceberg,” Blichfeldt Andersen says.
It is easy to find bots in the GPT Store whose descriptions suggest they might be tapping copyrighted content in some way, as Techcrunch noted in a recent article claiming OpenAI’s store was overrun with “spam.” Using copyrighted material without permission is permissable in some contexts but in others rightsholders can take legal action. WIRED found a GPT called Westeros Writer that claims to “write like George R.R. Martin,” the creator of Game of Thrones. Another, Voice of Atwood, claims to imitate the writer Margaret Atwood. Yet another, Write Like Stephen, is intended to emulate Stephen King.
When WIRED tried to trick the King bot into revealing the “system prompt” that tunes its responses, the output suggested it had access to King’s memoir On Writing. Write Like Stephen was able to reproduce passages from the book verbatim on demand, even noting which page the material came from. (WIRED could not make contact with the bot’s developer, because it did not provide an email address, phone number, or external social profile.)
OpenAI spokesperson Kayla Wood says it responds to takedown requests against GPTs made with copyrighted content but declined to answer WIRED’s questions about how frequently it fulfills such requests. She also says the company proactively looks for problem GPTs. “We use a combination of automated systems, human review, and user reports to find and assess GPTs that potentially violate our policies, including the use of content from third parties without necessary permission,” Wood says.
New Disputes
The GPT store’s copyright problem could add to OpenAI’s existing legal headaches. The company is facing a number of high-profile lawsuits alleging copyright infringement, including one brought by The New York Times and several brought by different groups of fiction and nonfiction authors, including big names like George R.R. Martin.
Chatbots offered in OpenAI’s GPT Store are based on the same technology as its own ChatGPT but are created by outside developers for specific functions. To tailor their bot, a developer can upload extra information that it can tap to augment the knowledge baked into OpenAI’s technology. The process of consulting this additional information to respond to a person’s queries is called retrieval-augmented generation, or RAG. Blichfeldt Andersen is convinced that the RAG files behind the bots in the GPT Store are a hotbed of copyrighted materials uploaded without permission.
OpenAI’s terms for the GPT Store explicitly prohibit “using content from third parties without the necessary permissions,” but right now there’s no way for outsiders to check whether their copyrighted material has been uploaded by the developers creating GPTs. That means concerned copyright holders have to go hunting.
Blichfeldt Andersen uses keywords to comb the GPT Store for chatbots that might be using material from his company’s books. He then has to engage each bot he finds in conversation to try to divine whether it has been trained on Praxis titles. It’s tedious work but is getting results: He ha successfully prompted several bots to reproduce specific passages from Praxis textbooks. “You have to trick the language model to reveal itself,” he says.
The lawsuits accusing OpenAI of scraping copyrighted material without permission to train its systems may take years to resolve, but disputes over material uploaded to the GPT Store could have more immediate repercussions. “GPTs change the relationship between OpenAI and its users in an important way for copyright,” says James Grimmelmann, a professor of internet law at Cornell University. When online platforms allow users to upload their own content—for example, YouTube allowing regular people to publish personal videos—they are subject to the Digital Millennium Copyright Act, part of US copyright law that allows copyright holders to file complaints if their intellectual property is disseminated without their permission. So if, say, a YouTuber posts a clip with music in the background that they didn’t license, sometimes music labels will file complaints and get the videos taken down. Since the GPT Store allows developers to upload their work, it is governed by these rules.
“Infringing” Bots
Intended as an anti-piracy statute, the Digital Millennium Copyright Act now has outsize importance in copyright enforcement, as it allows copyright holders a relatively zippy way to demand that their work be removed when people put it online without their permission: DMCA takedown notices.
After Blichfeldt Andersen found his first few examples of Praxis textbooks in the GPT Store, he filed DMCA takedown notices to OpenAI. He says the company didn’t respond until he asked the Danish Rights Alliance, which represents the interests of creative workers in Denmark, to help out. The DRA has a hard-charging approach to protecting members’ copyright in the age of AI. Last year it got a collection of over 196,000 books used for generative AI training temporarily taken offline by filing DMCA takedown notices.
Thomas Heldrup, the DRA’s head of content protection and enforcement, often leads its AI crusades. He played a central role in taking on the GPT Store, too, filing complaints on behalf of Praxis that led to OpenAI taking down bots that the publisher considered infringing.
“They have been pretty quick to remove infringing GPTs that we have reported to them,” Heldrup says. Still, he’d like to see the company make changes. “There needs to be better tools at the disposal of rights holders to search for these infringing GPTs,” Heldrup says.
Blichfeldt Andersen says Praxis is considering legal action against OpenAI if conditions on the GPT Store do not improve. He would like to see the company and other AI developers add more robust systems that scan for copyrighted material in uploaded RAG content, similar to the Content ID system in place to protect copyrighted materials from appearing on YouTube. (When asked if it plans to introduce a Content ID–like system, OpenAI did not answer directly, but OpenAI’s Wood tells WIRED it does screen GPTs proactively.)
Startups are already appearing that offer to help AI companies scan for infringing output. Anand Kannappan, CEO and founder of Patronus AI, says its recently launched Copyright Catcher service, designed to detect copyrighted text, could “absolutely” detect potential infringement in custom GPTs.
But although OpenAI has complied with some DMCA takedown requests aimed at its GPT Store, some intellectual property experts believe that the company could argue that the concept of fair use protects some GPTs reliant on copyrighted works.
“I think it would be really hasty to say you can’t upload anything that’s copyrighted to these tools without permission, because that rules out hugely important education and research functions,” says Meredith Jacob, the project director of copyright and open licensing at American University Washington School of Law. She sees the creation of GPTs that help students understand their textbooks as something that could easily be protected by fair use.
Without a simple way for outsiders to see what’s been uploaded in the supplementary files for the GPT Store’s bots, copyright holders worried about infringements either have to trust that OpenAI’s automated systems are catching violations—or take the time-consuming approach of investigating each suspicious bot individually. “It’s like finding a needle in a haystack,” says Blichfeldt Andersen.
OpenAI’s GPT Store Faces Copyright Complaints Due to Content Usage
OpenAI, the renowned artificial intelligence research laboratory, has recently found itself facing copyright complaints regarding the usage of copyrighted content in its GPT Store. The GPT Store, which was launched in June 2023, allows users to purchase and utilize various AI-generated content such as articles, stories, and even code snippets. However, this innovative platform has raised concerns among content creators and copyright holders who believe their intellectual property is being exploited without proper authorization.
The GPT Store is powered by OpenAI’s advanced language model, GPT-3, which has gained significant attention for its ability to generate human-like text. The model is trained on a vast amount of data from the internet, including copyrighted material. While OpenAI has implemented measures to prevent the generation of explicit or illegal content, it seems that copyrighted works have slipped through the cracks.
Several content creators have come forward with complaints, stating that their work has been used without permission or proper attribution. They argue that OpenAI’s GPT Store is essentially profiting from their intellectual property without compensating them or seeking their consent. This has sparked a debate about the ethical implications of AI-generated content and the responsibility of platforms like the GPT Store in ensuring copyright compliance.
OpenAI has acknowledged the concerns raised by content creators and copyright holders and has expressed its commitment to addressing the issue. In a statement, OpenAI stated that they are actively working on improving the attribution system within the GPT Store to ensure proper recognition of copyrighted works. They have also pledged to establish a process for copyright holders to report any unauthorized usage and seek appropriate actions.
However, implementing an effective copyright compliance system for AI-generated content is no easy task. The nature of AI models like GPT-3 makes it challenging to identify and filter out copyrighted material accurately. OpenAI has been relying on a combination of automated filters and human reviewers to identify and remove problematic content, but it is an ongoing process that requires continuous improvement.
The copyright complaints faced by OpenAI’s GPT Store highlight the complexities surrounding AI-generated content and the need for a comprehensive legal framework to address these concerns. As AI technology continues to advance, it becomes increasingly crucial to strike a balance between innovation and protecting intellectual property rights.
Content creators and copyright holders are not the only ones affected by the copyright issues surrounding AI-generated content. Users of the GPT Store may also face potential legal consequences if they utilize copyrighted material without proper authorization. It is essential for users to be aware of copyright laws and ensure they have the necessary permissions before using any content generated by AI models like GPT-3.
OpenAI’s GPT Store has undoubtedly revolutionized the way we consume and create content. However, it also brings to light the challenges and responsibilities that come with AI-generated content. As OpenAI continues to refine its platform and address copyright concerns, it is hoped that a balance can be struck between enabling innovation and respecting intellectual property rights in the realm of AI-generated content.