Skip to content

2 April 2024

Dear ICO,

We welcome the ICO’s efforts to provide greater clarity on the legitimate purposes rightfully required under current data protection law and as they apply to generative AI. We agree it is necessary to ensure that ‘the UK’s regulatory regime keeps pace with and responds to new challenges and opportunities presented by AI’[1]. We share the ICO’s efforts to underscore that personal data must be repurposed under a lawful basis, such as consent, licence, or permission by law. Without such a lawful basis, there is a real risk of harm to individuals, to the proper development of this still nascent technology, and to our community whose responsibilities as publishers include safeguarding trusted and verified content. 

We are encouraged by the ICO’s clear understanding that different AI systems may produce different AI models, and it is vital that during each stage of these models’ life cycle, individual data and legally protected content are used for legitimate purposes. Each of these stages is potentially a new processing activity, and the reuse and potential republication of an individual’s data or content without consent mustn’t be ignored.  Many in the publishing industry understand this chain of activity in AI generative development well and have developed agreements to explicitly allow content to be reused in well-defined parameters.  We, therefore, question the suggestion that it is impossible to define and regulate each stage in the generative AI chain, and we support the ICO’s policy position as articulated: ‘Data protection considerations are relevant for any activity that involves processing personal data’[2].

We agree with the ICO that each stage during generative AI development may involve different data processing activities for different purposes. We support all measures to ensure reuse of personal data and content satisfies the purpose limitation test, including that such reuse is not in breach of intellectual property laws. We have concerns that many AI developers do not understand the importance of articulating an explicit purpose for the reuse of personal data or appreciate that such data is often embedded in copyright-protected works.  Despite the ICO offering clear guidance in their April 2023 ‘Eight questions that developers and users need to ask’, we are concerned that these useful instructions are not being adhered to. We look forward to more meaningful gestures on the part of AI developers to ensure compliance with data protection and copyright rules and support the ICO to act as a regulator in this regard. 

We also agree with the ICO that it is essential that legitimate purposes are understood by all relevant parties: the organisation developing the AI model; the people whose data is used for training; the people whose data is used during deployment; and the ICO. It is crucial that a specified purpose is articulated each time data or legally protected content is reutilised, and that each necessary party is aware of that purpose.  This will enable individuals and the publishers that represent them, to exert sufficient control over their own personal data and intellectual property thereby ensuring humans remain in the AI loop. 

We have grave concerns that currently people’s data and intellectual property are being used outside of any individual’s reasonable expectations and with only a vaguely identified general purpose. How AI developers meaningfully explain to each relevant party what, how, and why data is being used is a challenge. We do believe it possible for sufficiently detailed summaries to be created explaining where all training data derives from, for what legitimate purposes, and indicating if reused under licence, web scraping, or other presumed permission. However, we are not currently under the impression that AI developers see the urgency of these transparency requirements. We welcome the ICO’s regulatory approach to ensure safe, reliable, and permitted AI models are developed.

Transparency requirements should facilitate parties pursuing their legitimate interests, including the right of copyright holders to exercise and enforce their rights. Transparency requirements may include declaring where main data collections or sets were procured from; categories of works or content included in training data; a clear understanding of who collected the personal data and how that data was obtained; quantities of content per sector/discipline/territory or other indicators may also assist.  We understand that disclosure requirements may change over time, particularly once individuals receive more information on how their content has already been processed without permission, but we believe it is important for all stakeholders in the generative AI chain, which includes the individual who provides the content for AI models to repurpose, have immediate clarity on how their content has been reproduced and for what reasons.

We challenge the notion that general large language models may be incapable of any type of sufficient identification to justify legitimate purposes. Although we recognise different transparency requirements may be suitable for different AI models, even the broadest of these models may be explained as tools for processing, analysing, and generating natural language or images using different types of analysis from rule-based approaches, statistical models, or deep learning. It is possible to detail what type of text/image analysis the developer is working on, and then identify whether any data reuse is necessary, lawful, transparent, fair and in alignment with the stated scope of the training activity. We welcome an open discussion on how to specify the purpose of holding, processing, and in some cases duplicating personal data.

We see a future where generative AI models may be based on a variety of content, and it may be possible for such training to exclude personal data; however, without AI developers explaining what training data they are using, it is difficult to discern the extent to which individuals and rights holders have already suffered harm. When understanding the limits over which personal data has been used improperly, we agree with the ICO that a key factor must be the individual’s reasonable expectations at the time processing began. As we have little evidence on how many AI developers have taken this personal data to date, it is difficult to discern what individuals’ reasonable expectations were at the time AI companies harvested, kept, and processed their personal information.  We encourage the ICO to consider not only regulations for future data collecting in the context of AI training activities but also how we may best retroactively ensure compliance with data protection principles. 

The collection, storage, management, and proper safeguarding of vast amounts of content is expensive and challenging, our members are well aware of the costs and investments necessary to maintain trusted data/publication resources. We do not think that AI developers should be immune from the responsibilities involved in the proper handling of an individual's personal information and content and welcome all partnerships that allow for a transparent understanding of how data and content are repurposed during each stage of the AI model development process. 

Strong enforcement of existing data protection and intellectual property regulations, as well as a clear articulation of the application of these principles to all stages of generative AI development, would be welcomed by ALPSP members. Our communities are often the guardians of vast amounts of content and are keenly aware of the fraud, disinformation, and deterioration in trust that are often the consequences of the misuse of personal data or legally protected content. We urge the ICO to take timely action to uphold the UK’s government vision of a pro-innovation approach to AI regulation and an intention to embed considerations of fairness into AI. Respecting legitimate purposes behind the reuse of individuals' data or legally protected content is key to this fairness and we welcome all opportunities to consult with the ICO further.

The Association of Learned and Professional Society Publishers (ALPSP) is the international trade association which supports and represents not-for-profit organizations that publish scholarly and professional content and those that collaborate with them. Our diverse membership encompasses society, university, and traditional publishers, alongside their associated communities, thereby representing those who are simultaneously at the forefront of AI innovation as well as being responsible for upholding the principles of accuracy, reliability, and trustworthiness in information dissemination. 

Please do not hesitate to contact me if I can be of any further assistance.

With Kind Regards, 
Wayne Sime
Chief Executive
The Association of Learned & Professional Society Publishers (ALPSP) 

[1]  ICO ‘Guidance on AI and data protection’
[2]  ICO Generative AI second call for evidence: Purpose limitation in the generative AI lifecycle’