The ‘fair use’ defence used by AI companies to justify training their models with unlicensed content was scrutinised in the US courts last week, with the judge overseeing a key lawsuit filed by a group of authors against Meta telling the tech company, “I just don't understand how that can be fair use”.
That comment came in the context of the ongoing legal battle between the authors and Meta over its use of content from thousands of books that it torrented to train its Llama model. At times Judge Vince Chhabria seemed wholly unconvinced that the fair use defence applies to AI training.
According to Reuters, he told Meta’s lawyers that their use of copyright-protected material to train their AI means that they can “create a product that is capable of producing an infinite number of competing products”. As a result Meta is “dramatically changing - you might even say obliterating - the market for that person’s work, and you’re saying that you don't even have to pay a licence [fee] to that person”. He then concluded, somewhat strikingly, “I just don't understand how that can be fair use”.
While that suggests that Chhabria may favour the arguments put forward by authors in this dispute, he also told the writers’ legal reps that the case would likely swing on whether or not they can demonstrate that Meta’s Llama model will damage the commercial potential of the authors’ works. That, he added, was something that has not yet been demonstrated.
Comedian Sarah Silverman and novelist Richard Kadrey are among the authors who sued Meta, accusing the tech company of copyright infringement for making copies of their books without permission when collating a training dataset for Llama. Meta claims that AI training is fair use and therefore it didn't need permission. Needless to say, the authors strongly disagree. Both sides are seeking summary judgements in their favour, requests that Chhabria was considering in court last week.
The fair use argument is at the centre of numerous lawsuits filed by American copyright owners, including record labels and music publishers, against AI companies in relation to their generative AI models. Which means as the first cases get to court, rulings on the fair use defence will impact on all the other litigation.
A judge previously rejected the fair use defence in a legal battle between Thomson Reuters and AI company Ross Intelligence, but that related to an AI-power search engine, not a generative AI model. Therefore what happens in the Llama case is more directly relevant to all the other lawsuits.
There are four main criteria for assessing whether or not the use of copyright protected works constitutes fair use. First, the purpose and character of the use, including whether it’s a ‘transformative’ use. Second, the nature of the copyrighted work. Third, the amount and substantiality of the portion taken. And fourth, the effect of the use upon the potential market for or value of the copyright-protected work.
AI companies are leaning heavily on the argument that their use of existing works is highly transformative. Meanwhile, some copyright owners have argued that the AI companies have often scraped content from unlicensed sources online, which should alone prevent them from using the fair use defence, citing precedent in case law that says there cannot be fair use of pirated material.
However, last week Chhabria was keen to focus on the fourth factor for assessing fair use claims, ie the effect Meta’s use of the authors’ works has on the potential market for or value of those very works.
That’s what prompted the judge to observe that, by using the authors’ books to train Llama, Meta is “dramatically changing - you might even say obliterating - the market for that person’s work”. So much so, “I just don't understand how that can be fair use”.
Although, that’s not to say the authors have definitely demonstrated that “the market for their actual copyrighted work is going to be dramatically affected” by Llama.
“It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected by the billions of things that Llama will ultimately be capable of producing”, the judge told the writers’ attorneys. But, he added, it’s “not obvious” to him that that is definitely the case.
One challenge here is that the existence of generative AI models like Llama might have more of a negative commercial impact on future works written by the authors - or more generally on the careers of these and other writers - rather than on the specific books that were copied as part of past training processes.
According to Wired, Chhabria alluded to this in court by providing a hypothetical music scenario. Having wondered if Taylor Swift would be harmed if her music was used to train a generative AI model, he added, “what about the next Taylor Swift?”, positing that a “relatively unknown artist” whose work was ingested by Meta would likely have their career hampered if the model produced “a billion pop songs” in their style.
These are all things Chhabria needs to consider in more detail as he further reviews the arguments presented by both sides in this case. The judge acknowledged that his conclusions will have a big impact on other legal battles and the entire AI sector, and therefore he needs to dedicate enough time to all that consideration.
At the conclusion of last week’s hearing the judge joked, “I will issue a ruling later today - just kidding - I will take a lot longer to think about it”.