UK government should clarify whether it’s investing in AI companies that use unlicensed training data, say campaigners

The UK government is now actively supporting British AI start-ups through a thing called the Sovereign AI Unit and an accompanying investment fund. However, it’s not clear if companies benefiting from that support are obliged to license any copyright protected content they may use when training their AI models.

And that’s a problem, according to Beeban Kidron, the member of the House Of Lords who has fought for more protection for creators and copyright owners in the context of AI.

And also a missed opportunity, because the Sovereign AI Unit could encourage the AI sector at large to respect copyright and forge licensing deals with the UK’s creative industries, which could have an impact beyond the companies it is actually involved in.

So, even if the government doesn’t get involved in any music AI start-ups, it could still use its influence to demonstrate that licensing works for AI training is desirable and do-able.

Kidron has used an op-ed in The Times to call on the government to publish the assessment criteria being employed by the Sovereign AI Unit when picking what businesses to support.

That follows a blog post published last week by AI expert and campaigner Ed Newton-Rex, who says that the people running the Sovereign AI Unit have so far been very non-committal indeed when it comes to copyright matters.

The government describes the Sovereign AI Unit as “a £500 million first-of-its-kind national effort to back Britain’s smartest founders and keep the future of AI built on British shores”. Earlier this month the unit announced the first seven AI companies that it is supporting, while also revealing it is in talks with 30 other possible partners.

At least some of those companies will be using existing copyright protected works to train their models, which brings us to the big debate over whether or not AI companies making use of existing works need to first get permission from the relevant copyright owners.

The copyright industries, including the music industry, insist that they do. But many AI companies argue they don’t, because they can rely on text and data mining copyright exceptions in certain counties or the US principle of fair use.

UK copyright law doesn’t provide a commercial text and data mining exception. The current government proposed introducing one, but then abandoned that plan after a massive backlash from the creative industries. However, some countries do provide exceptions that AI companies can possibly rely on.

In his blog post, Newton-Rex says he has asked various people involved in the Sovereign AI Unit “whether it would invest in companies that train on copyrighted work without a licence”. The only person to respond to that question was the unit’s chair, James Wise. Though, Newton-Rex explains, “he never answered the question that had been asked”.

Wise did say, “we will only invest in companies that follow the law on this issue. It is really important - and we have been clear (many times!) that copyright rules should be respected, use of copyright works to train AI in the UK will require a licence unless an exception exists”.

However, that statement doesn’t really answer Newton-Rex’s question. That’s because, if a UK-based AI company actually trained its model in another country where a relevant copyright exception applies, then it could say copyright rules had been respected and it hadn’t trained an AI in the UK without a licence.

Which means, according to Wise's answer, the UK government could in fact invest in AI companies that are using unlicensed training data. The copyright industries argue that the obligation to license training data should apply to any AI model available in the UK, oblivious of where it is trained, but in the legal dispute between Getty and Stability AI, Getty failed to demonstrate that obligation exists in law.

In her Times piece, Kidron writes that - when AI companies use unlicensed training data - “the harm to the creative industries is existential”, because “if a creator cannot make money from their work, the road to creative professions is closed to all but the independently wealthy”.

She goes on, “the government has pointed out that it cannot control how AI companies build their products abroad - true. But the sovereign AI fund is not concerned with overseas AI companies. It can choose the kinds of companies that are deserving of its funds”.

It’s “entirely possible” to train AI models using “copyright-cleared training data”, she continues, and the Sovereign AI Unit could “use its public stake - our money - to nudge companies towards licensing practices that support both UK AI and UK creators. There is a clear public interest in doing so”.

With that in mind, Kidron says that “access to public funding and compute should come with clear requirements” including “public eligibility rules that ensure the chosen AI firms license the UK content that fuels their models”.

And given the ambiguities around the unit’s current criteria for picking what companies to support, she concludes, “the government must publish its assessment criteria and urgently update them if, as seems likely, they fall short of these basic requirements”.

UK government should clarify whether it’s investing in AI companies that use unlicensed training data, say campaigners

Music industry welcomes revised NO FAKES Act, but Spotify says “still work to do”

Suno sued again - this time by production duo who claim generative AI has caused an 80% slump in sync licensing income

BPI sets out transparency and sovereignty demands to secure “AI licensing boom”