Be careful of the parts of news stories that rely on ifs and coulds and maybes. Having said that, this is fascinating exploration of what’s at stake. It’s a fact that OpenAI’s business model involved training its large-language model on copyrighted material, without acquiring legal permission to do so, and without any mechanism to share the revenue OpenAI stands to gain from supplying answers that incorporate and remix those copyrighted works.
I suspect that the owners of NYT would rather license its content in return for a cut of the AI’s profits, but the creative and intellectual work of lots of smaller publishers, indie authors, and random bloggers was also scraped and ingested in the same manner.
If OpenAI is found to have violated any copyrights in this process, federal law allows for the infringing articles to be destroyed at the end of the case.
In other words, if a federal judge finds that OpenAI illegally copied the Times‘ articles to train its AI model, the court could order the company to destroy ChatGPT’s dataset, forcing the company to recreate it using only work that it is authorized to use.
Federal copyright law also carries stiff financial penalties, with violators facing fines up to $150,000 for each infringement “committed willfully.”
“If you’re copying millions of works, you can see how that becomes a number that becomes potentially fatal for a company,” said Daniel Gervais, the co-director of the intellectual property program at Vanderbilt University who studies generative AI. “Copyright law is a sword that’s going to hang over the heads of AI companies for several years unless they figure out how to negotiate a solution.”