The British boss of Microsoft’s consumer AI unit Microsoft AI, Mustafa Suleyman, shared a curious viewpoint on the status of online content at an event in Aspen last week that was somewhat at odds with the basic principles of copyright law. Which is interesting given both Microsoft and OpenAI - in which Microsoft holds a 49% stake - are currently fighting copyright lawsuits in relation to the training of generative AI models.
Suleyman, also co-founder of the now Google-owned AI company DeepMind, was interviewed at Aspen Ideas by journalist Andrew Ross Sorkin. He was asked whether companies developing generative AI models have “effectively stolen the world’s IP”, by scraping vast amounts of existing content from the internet to use when training their models.
Suleyman responded, “with respect to content that’s already on the open web, the social contract of that content since the 90s has been that it is ‘fair use’. Anyone can copy it, recreate with it, reproduce with it. That has been ‘freeware’ if you like, that’s been the understanding”.
He did then concede that there might be some restrictions to that ‘social contract’, perhaps mindful that Sorkin, as a journalist, would probably not consider his journalism to be ‘freeware’. Or maybe Suleyman just remembered that he himself sits on the board of news publisher The Economist Group, which also puts quite a lot of value on its journalistic output.
“There’s a separate category”, he continued, “where a website, or a publisher, or a news organisation, had explicitly said ‘do not scrape or crawl me for any other reason than indexing me so that other people can find this content’. That’s a grey area and I think it’s going to work its way through the courts”.
It is true that there is a grey area when it comes to the copyright obligations of generative AI companies making use of existing content when training models, though not in the way Suleyman expressed.
Copyright is generally an automatic right that gives creators control over what happens to the outputs of their creativity, and there is no default assumption that anyone can do whatever they like with creative works unless a corporate entity says otherwise. In fact it’s the opposite, if a creator is happy for people to make use of their work without permission or licence, that needs to be declared.
The big dispute between copyright owners and AI companies does centre on the principle of ‘fair use’ under US law. However fair use doesn't relate to the content and how it was published, but rather how a third party makes use of that content.
In some scenarios a third party’s use of the content would be fair use under the American system, meaning they wouldn’t need permission from the copyright owner.
The key question currently being posed in multiple lawsuits is whether the use of existing content to train an AI model is one of those scenarios. The copyright industries - including the music industry - are adamant that it is not.
Suleyman is right, however, that that question will work its way through the courts. Lawsuits that address the big question of what is, and is not, fair use include those filed by the music publishers against Anthropic and the record companies against Suno and Udio, as well as additional similar lawsuits filed by authors and newspaper owners against Microsoft and Open AI.