Published
17th June 2025

Contents

Summarise Blog

In this blog we explore if, and to what extent, the data protection legal framework in the UK can prevent AI training models from using third party intellectual property (IP), and more particularly, material protected by copyright and related rights. This becomes even more meaningful in light of the government’s consultation on whether to adopt a similar approach to the text and data mining (TDM) exemption as is currently in force in the EU.

We will look briefly at the EU TDM exemption, the current TDM exception under the Copyright, Designs and Patents Act 1988 (CDPA) and the government’s consultation on whether to adopt the EU TDM exemption. We will then look at the implications for AI developers of using publicly available material protected by IP rights that may contain personal data in training their models and the steps needed to overcome such implications. Finally, we will conclude that, to the extent that material protected by IP rights does include personal data, this could significantly impair the ability of AI model developers to use such material in training their models.

The EU TDM exemption

According to Article 2(2) of the EU Directive on copyright in the Digital Single Market (the DSM Directive)[1], TDM means:

“Automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”.

A majority of generative AI models are trained with content that is publicly available. This is done using software which extracts information available online. The DSM Directive allows such extraction of content without permission from owners of the copyright in such content on two occasions:

  • under Article 3(1) for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access, and
  • under Articles 4(1) and 4(3), for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining […] on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their right holders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.

The CDPA exception

TDM is not totally prohibited in the UK. Although the CDPA does not make specific reference to a TDM exception, Section 29A clearly allows such operations for scientific non-commercial research. Section 29A of the CDPA provides that:

“The making of a copy of a work by a person who has lawful access to the work does not infringe copyright in the work provided that the copy is made to carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose […]”.

The UK government’s consultation on copyright and AI

The government maintains that the current uncertainty in the UK around use of works protected by copyright and related rights in training AI models prevents both growth of the AI sector and rights holders from being remunerated for their works used in AI training. The government’s Copyright and AI consultation (which closed in February 2025) aims at stopping this uncertainty to unlock growth. One of the interventions the government proposes to achieve this aim is the adoption of the “text and data mining” exemption to copyright law, similar to the EU TDM exemption.

The government maintains that:

“This approach seeks expressly to balance the objectives of access to works by AI developers with control over works by right holders supported by increased trust and transparency for all parties.”

In practice, it means that AI developers can train their models using publicly available works that are available to them without permission from right holders, if any, unless the latter have expressly opted out from use of their work in this way.

Personal data and training AI models

Although the most obvious source for training AI models is publicly available material online, this is not the only source of such material. AI developers may access non-publicly available material for training their models through licensing arrangements. In both instances the material may contain personal data protected by data protection laws. When the material contains personal data, applicable data protection laws must be observed. For the purposes of this blog, it is personal data available online that we are primarily concerned with, given the nature of TDM as the main tool for searching information and data on websites.

Personal data publicly available

The Information Commissioner’s Office (ICO) makes it clear that the fact that personal data is publicly available doesn’t mean that the data can be freely used.

“If you obtain personal data from publicly accessible sources (such as social media, the open electoral register and Companies House), you still need to provide individuals with privacy information, unless you are relying on an exception or an exemption.”

However, even if AI developers rely on the exception that providing a privacy notice would be impossible, they “must carry out a DPIA in order to identify and mitigate the risks associated with” further use of personal data. (see What common issues might come up in practice? | ICO).

In other words, if AI developers use material that is publicly available, and which contains personal data, they must ensure that they have a lawful basis for using the data and they must also serve a privacy notice to data subjects. Even if they can be excused from serving a privacy notice, they will still need to run a data protection impact assessment (DPIA) where processing poses a high risk to data subjects’ rights and freedoms.

Of note, the ICO emphasises that “where the use is less likely to be expected, or could significantly affect individuals” privacy notices must be provided as soon as possible after the data is obtained (which would mean much sooner than the one month long stop date for doing so).

The need to carry out a DPIA is certainly an additional burden that developers of AI will need to overcome. The ways any outputs of the AI model may be used by its users, and whether the outputs themselves will contain personal data, must certainly be taken into account when running a DPIA as such uses may pose a higher risk to data subjects.

AI Developers as controllers of personal data

In addition to the above, AI developers as controllers of personal data will have to comply with a number of additional obligations under data protection laws such as to:

  • comply with the principles listed in Article 5 of the UK GDPR, i.e. to ensure that personal data are processed lawfully, fairly and in a transparent manner, collected for specific, explicit and legitimate purposes, be adequate, relevant and limited to what is necessary for the processing purposes, be accurate and up to date, kept for no longer than is necessary for the purposes of processing the data and processed securely;
  • inform the data subjects of their right to request rectification or erasure of their data or to restrict the processing, the right to lodge a complaint and the existence of any automatic processing of their data; and
  • facilitate the exercise of the data subject’s rights.

Therefore, there are numerous practicalities that AI developers should consider before embarking on the use of publicly available material protected by intellectual property rights that may contain personal data. These practicalities involve a considerable administrative and potential financial burden that surely the AI industry would need to consider, especially taking into account the commercial uses the EU TDM exemption will allow. Although it is true that the ICO has issued only a limited number of fines, its policy on the issue could change considerably if AI developers make huge amounts out of the use of personal data for training their AI models for commercial purposes.

Of some relevance, in August 2023, the ICO alongside eleven other data protection and privacy authorities published a joint statement calling for the protection of people’s personal data from unlawful data scraping taking place on social media sites. Although the statement was focused on social media on the back of a number of mass data scraping incidents, the statement makes clear that such incidents may be reportable to the ICO. In the Copyright and AI consultation it is stated that the ICO is currently reviewing the intersection of generative AI and data protection with a view to issuing relevant guidance in due course and this is still today forthcoming.

So, can data protection help protection of IP in AI?

Undoubtedly, AI developers wishing to use material protected by copyright and related rights that contains personal data would need to comply with data protection legislation on top of the protections provided under copyright laws. This is the case already in the UK given that TDM is to some limited extent permitted under the CDPA for non-commercial research purposes; but it will be of greater importance if the government decides to proceed with adopting the EU TDM exemption allowing use of such content to train AI models for commercial purposes.

Since commercial uses may pose a higher risk to data subjects, data protection legislation can indeed contribute to some extent in protecting works protected from IP from being used in training AI models. The requirement to run a DPIA alone could be a deterrent to AI developers from using copyright content that involves personal data.

The exercise of data subject rights can also add a disproportionate administrative burden to AI developers. Finally, the ICO’s stance over its fine imposing powers may change if personal data are to be used for highly profitable purposes.

[1] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC.

Our latest ai series content

Our legal experts are here to answer any question you might have

If you’d like to speak to a member of our team, please fill out the form and we’ll be in touch within two hours.
If you know who you need to contact, you will find a full list of our people with email and telephone numbers here.
Call Us: 0330 024 0333

About the Author

Reveka qualified as a solicitor in England and Wales in 2018 while working in house within various Universities (including Russell Group) from 2014 to 2022, when she joined Shakespeare Martineau. She is experienced in a whole range of contracts, either it be with individuals, SMEs, larger organisations, local authorities or charities, and she is able to appreciate differing client objectives and needs. In addition, Reveka’s previous experience in another civil law jurisdiction, as well as working with University international partners and contractors, means she is able to appreciate the risks and advise on transactions with an international element. She can…