1 March 2024
Dear ICO,
We welcome this opportunity to provide feedback on your consultation ‘Generative AI First Call for Evidence’ on behalf of the Association of Learned and Professional Society Publishers (ALPSP). ALPSP is the international trade association which supports and represents not-for-profit organizations that publish scholarly and professional content and those that collaborate with them. Our diverse membership encompasses society, university, and traditional publishers, alongside their associated communities, thereby representing those who are simultaneously at the forefront of AI innovation as well as being responsible for upholding the principles of accuracy, reliability, and trustworthiness in information dissemination.
We align strongly with the ICO's assertion that AI developers must adhere to the lawfulness principle of data protection, ensuring compliance with all applicable laws, including copyright legislation, and maintaining a valid basis under the UK GDPR. We commend the ICO's proactive regulatory stance aimed at safeguarding individuals' fundamental rights and freedoms. We also support the ICO’s recognition of private contractual measures (for example licensing) as a tool to allow for the protection and control of personal data and aid in mitigating risks posed by web scraping.
A pivotal concern for our members revolves around the transparency of AI developers' training methodologies and their implications for downstream data protection and individual rights. Transparent disclosure of how and what content is sourced, processed, and utilized by AI systems is imperative for ensuring informed consent and empowering individuals to best manage their own personal data and intellectual property.
We recognize the intricate intersection between regulations governing personal data reuse and how individuals' copyright-protected content is protected. Frequently, it is the unique personal expression inherent in individuals' personal data that is appropriated without authorization. We commend all endeavours aimed at safeguarding individuals' data and original content, ensuring that such assets are utilized only with explicit consent.
We share ICO’s concerns over web scraping, particularly when individuals are unaware their personal data or intellectual property is being harvested and processed by undisclosed third parties and/or that data has been made publicly available without the owner's consent, often in violation of current copyright regulations. We are also troubled by the growing literature of harms related to web-scraping, spanning both upstream and downstream consequences.
We are interested in understanding more details on what mitigation measures the ICO envisions to ensure personal data is used for a legitimate purpose. All our members support scientific, academic, and research communities and downstream risks that threaten research integrity are an ever-growing concern. Robust rules that allow control at ingestion/prevent upstream harms would go far to mitigate these risks.
We do challenge your assertion that only extremely large data sets work. As explained in this recent study published in Nature Communications, quality rather than quantity may be more effective vis-à-vis accurate training sets. This mirrors feedback from many of our members, who are directly licensing limited content libraries for AI training and development. Carefully curated datasets will result in improved AI functionality and are necessary to combat bias; large, unlicensed data sets are ripe grounds for false or misleading information to be amplified and lead to negative impacts. Maintaining such datasets requires ongoing diligence to ensure data remains current, allowing for rights to withdraw, and providing comprehensive content clean of intentional or unintentional errors. A combination of regulatory authority action, enforcement of existing legislative measures such as copyright rules, and private contracts will be required to meet these challenges. We therefore urge the ICO to support responsible management, storage, and processing to comply with legal requirements, including copyright laws and UK GDPR provisions.
We are concerned about how to effectively future-proof any regulation in this fast-moving landscape. This is why we believe it is paramount that immediate action be taken to ensure individuals do not lose control over their personal data and intellectual property. Considering the wholesale taking already conducted via web scraping without regard to the content therein, steps need to be taken as soon as possible to require AI developers to respect individual rights and property.
Given the sensitivity of individuals' personal data, particularly their limited capacity to safeguard it independently, we applaud the ICO's pivotal role in scrutinizing AI developments and enforcing compliance standards from the inception stage. We welcome the ICO's ongoing consultation efforts and stand ready to contribute constructively to shaping regulatory measures that uphold data protection principles while fostering innovation responsibly.
Please do not hesitate to contact me if I can be of any further assistance.
With kind regards,
Wayne Sime, Chief Executive
The Association of Learned & Professional Society Publishers (ALPSP)