OpenAI responds to NYT lawsuit - says training is fair use, "regurgitations" are rare

OpenAI has responded to the recent copyright infringement lawsuit filed by the New York Times in a blog post in which it again insists that the training of generative AI models with existing content is fair use. However, as a good corporate citizen, the AI firm says it is happy to provide publishers with the option to opt-out, which - it adds - the NYT took in August 2023.

It also responds to claims in the NYT lawsuit that, with “minimal prompting”, OpenAI’s models will “recite large portions” of the newspaper’s articles “verbatim”. Such “regurgitation”, OpenAI insists, is “a rare bug that we are working to drive to zero”, adding: “We have measures in place to limit inadvertent memorisation and prevent regurgitation in model outputs”.

The battle between the technology sector and the content industries over the copyright obligations of AI companies is gaining momentum, of course. Copyright owners insist that AI firms must get permission before using existing content to train generative AI models. But most tech companies argue that they don’t need permission because of exceptions in copyright law or, in the context of American law, the always tricky principle of fair use.

Therefore, Open AI reckons, NYT’s allegations of copyright infringement are unfounded. “Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents”, its blog post argues. “We view this principle as fair to creators, necessary for innovators and critical for US competitiveness”.

“That being said”, it goes on, “legal right is less important to us than being good citizens. We have led the AI industry in providing a simple opt-out process for publishers to prevent our tools from accessing their sites”. In the European Union, a data mining copyright exception in law has an inbuilt opt-out for copyright owners, so basically OpenAI is applying that more generally.

The AI company is also very keen to stress that it is working in partnership with many copyright owners, including in the news business. And, indeed, it thought talks about such a partnership with NYT were going well until the newspaper company suddenly went legal in late December.

As for the NYT’s complaints about regurgitation, Open AI says that the newspaper firm “repeatedly refused to share any examples, despite our commitment to investigate and fix any issues”. Also, it reckons, regurgitations induced by NYT on its platform generally related to articles that had been widely published on third-party websites, and even then those articles were likely only heavily cited by its model as a result of very specific prompts.

None of which is likely to placate NYT or any other critics of the AI firm within the copyright industries. Though blog posts like this demonstrate the template increasingly employed by AI companies on copyright matters: “It’s all fair use, but hey, were collaborating with the savvy content owners, and anyway, were in this for the good of humanity, let us innovate otherwise, you know, big bad China will end up owning AI, and nobody wants that”.

OpenAI responds to NYT lawsuit - says training is fair use, "regurgitations" are rare

GEMA sues OpenAI

New York Times rejects Open AI’s unreasonable demand for journalists’ sources in copyright case

No copyright exception for AI reiterates UK government - but tech companies still lobbying for more change