Consultation on the data protection-compliant handling of personal data in AI models
Here you will find all information on the current consultation procedure of the BfDI on the data protection-compliant handling of personal data in AI models. The deadline for comments is 31 August 2025.
You can also download the consultation paper with the consultation questions as a PDF document.
Background of the consultation
AI models, especially large language models (LLMs), are usually trained with enormous amounts of data. This often includes personal data. In its Opinion 28/2024 of 18 December 2024, the European Data Protection Board (EDPB) stated that AI models may contain personal data if they were trained with personal data (often referred to in discourse as memorization). For the purpose of the present consultation, the term memorization is deliberately broader than the generally accepted scientific understanding and, in addition to literally reproduced training data, also includes analogous reproductions, as these are also relevant under data protection law, insofar as they contribute to the identification of a person.
This finding raises significant questions regarding the protection of personal data. The Federal Commissioner for Data Protection and Freedom of Information (BfDI) recognizes the need for systematically addressing data protection challenges in the planning, training, and use of such models.
Objective of the consultation
Recognizing the technical and legal complexity, the consultation aims at gathering concrete practical experiences, technical assessments, and normative considerations from stakeholders from various fields. The focus is on large language models. Whenever we talk about AI models in the following, we always mean large language models.
The results are intended to contribute to the development of data protection-compliant approaches for dealing with memorized data. The main results will be summarised in a consultation report, which will be published on the BfDI website.
The BfDI invites all interested parties to contribute to the discussion, especially representatives from science, business and civil society.
Submission of comments
Please send your contributions at the latest by 31 August 2025 to:
Konsultation2025@bfdi.bund.de
Comments received at a later date may not be taken into account for organisational reasons.
Please note that the submitted comments will be published on the BfDI’s website. Please indicate whether you consent to the publication of any personal data contained therein. Otherwise, the personal data will be redacted. Anonymous responses are also possible.
Further information
This consultation does not constitute a pre-determination, but serves solely to provide information on the technical and legal aspects of handling personal data in AI models.
Consultation questions
If AI training is conducted using anonymous data, the GDPR is not applicable to the training. However, given the data volume used for training, complete anonymization of AI models is generally not reliably possible.
1. According to Recital 26, sentence 3 of the GDPR, when determining whether a natural person is identifiable, account should be taken of all means reasonably likely to be used by the controller or by another person to identify that natural person, directly or indirectly. Taking into account the procedures listed in EDPB Opinion 28/2024, paras. 35 et seq., under what circumstances could an LLM be considered anonymous?
2. What technical measures do you already use or plan to use to prevent data memorization (such as deduplication, use of anonymous or anonymized training data, fine-tuning without personal data, differential privacy, etc.)? What experiences have you had with these?
3. How do you assess the risk of personal data being extracted from an LLM? Explain your assessment, if possible, using concrete examples, individual cases, or empirical observations.
Processing of memorized data
4. Data protection law is linked to the processing of personal data. Each input of a prompt triggers a calculation in the AI model, in which the (personal) data represented in the form of parameters influences the calculation result. Does this calculation constitute processing of these data within the meaning of Article 4 No. 2 GDPR, even if the calculation result, i.e., the output of the AI model, is not personal?
Intensity of intervention
When it comes to an assessment relating to data protection law, e.g. when choosing a legal basis, the intensity of the data processing operation may need to be assessed.
5. Do you already have experience with methods that estimate the amount and type of personal data memorized, or whether the AI model used contains personal data of a specific individual (e.g., privacy attacks/PII extraction attacks, etc.)? If so, how do you assess their informative value and possible limitations?
6. What is the amount of personal memorized data in AI models you know (as a percentage and total amount of training data)?
The data subjects’ rights
The black-box architecture of AI models poses a challenge for the effective guarantee of data subjects’ rights, in particular with regard to the rights of access, rectification and erasure pursuant to Articles 15 – 17 GDPR.
7. How do you proceed if a person exercises their right to access, rectify or erase their personal data in the AI model?
Further aspects
8. From your perspective, are there other aspects that play a role in the protection of personal data in AI models?