The Question of Legitimate Interest As Lawful Basis for Training AI Models

WAI CONTENT TEAM
Apr 27
7 min read

WAI Legal Insights Blog banner features two women, text on "Legitimate Interest for AI Models", and a yellow header for a blog series.

By Petruta Pirvan and Sonal Makhija

The growing reliance on legitimate interest as the primary lawful basis for AI training has been broadly accepted by EU data protection authorities. In comparison to the impracticality of obtaining consent at scale, legitimate interest offers a more pragmatic path. However, its applicability depends on the specific context - namely the purpose of AI training, the assessment of what is strictly necessary, and the safeguards implemented to reduce risks to individuals.

In this article, Petruta Pirvan and Sonal Makhija explore several real-world use cases and highlight the practical challenges that organizations commonly encounter.

Petruta is an AI Governance Specialist and a DPO focused on developing responsible AI frameworks, compliance policies, and risk‑management practices for enterprise environments. She drives ethical, secure, and transparent AI adoption by aligning organizational objectives with regulatory requirements and emerging industry standards.

Sonal is a data privacy and AI lawyer based in Stockholm. She has proven experience and expertise in leading the AI Act implementation and AI governance for H&M Group, with a focus on regulatory compliance, risk management, and responsible AI.

—------------------------------------------------------------------------------------------------------------------------

The Question of Legitimate Interest As Lawful Basis for Training AI Models

Background and Purpose

Overview and regulatory position

The General Data ProtectionRegulation (GDPR) requires that organizations have a “lawful basis” to process personal data - i.e., essentially have a legal justification for why the data is being used. Under the GDPR, there are six recognised legal bases: consent, contract, legal obligation, vital interests, public task, and legitimate interests. Organizations need to choose the right legal basis based on context and purpose of processing personal data prior to processing. This is not merely a compliance formality, but it is to ensure that data is handled transparently, individuals whose data is processed have rights, and risks are managed and mitigated across data lifecycle.

For many organizations, in the context of AI training and providing digital services, consent and legitimate interests are the most commonly relied upon legal bases.

In practice, more than consent, legitimate interest under Art. 6 (1) (f) GDPR is considered a practical legal basis for processing personal data in case of AI development. This is primarily because AI models require very large and diverse datasets, making it operationally unrealistic to obtain valid consent from all individuals whose data may be used in the AI development and training. Consent becomes even less feasible in scenarios where data originates from publicly accessible sources, legacy datasets or web‑scraped repositories, where re‑identifying and contacting millions of individuals would be impossible in practice. Moreover, even if consent was obtained, relying on it would introduce high operational uncertainty: the withdrawal of consent may affect the functionality of an AI system, particularly when the input data has been integrated into a vast training dataset alongside non‑personal data or data relating to other individuals. Once data is processed and incorporated into the model through training, removing a single individual’s personal data may require retraining or modifying the entire model – an approach that is incompatible with the GDPR’s requirement of consent to be freely withdrawn at any time. The unpredictability of AI outcomes and opaqueness of AI systems makes transparency difficult and compliance with the GDPR all the more challenging.

Against this background, legitimate interest is frequently relied upon to enable the scale of data processing necessary for effective AI training. In this context, in December 2024 the European Data Protection Board (EDPB), the authority that ensures consistent application of the GDPR across the EU, adopted Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. The Opinion offered clarifications relevant to assessing the use of legitimate interest as a lawful basis. Among the clarifications, the EDPB Opinion reiterated that controllers may rely on legitimate interest as a lawful basis for processing personal data only when three cumulative conditions are met:

● Legitimate interest: The interest pursued must be lawful, clearly defined, and genuine rather than abstract or hypothetical.

● Necessity of the processing: The processing must be strictly necessary for achieving the intended objective, meaning that the objective cannot be achieved equally and effectively by less intrusive means.

● Balancing test: The data subject’s interests or fundamental rights and freedoms must not override the legitimate interest.

Thus, organizations need to ensure accountability by conducting and thoroughly documenting a legitimate interest assessment (LIA). This was evident in the 2024 GEDI-OpenAI case, where GEDI Gruppo Editoriale planned to provide OpenAI with large volumes of its editorial archives containing sensitive personal data, for use both in real‑time news querying and for training AI models . The Italian Data Protection Authority (Garante) issued a formal warning that legitimate interest cannot bypass the sensitive data prohibitions under Article 9 of the GDPR. Further, Garante questioned whether, in the context of large‑scale AI training, a company’s commercial interests can outweigh the rights and freedoms of individuals whose personal information may be swept into the AI-training dataset, signalling that convenience and feasibility cannot bypass privacy rights.

Legitimate interest in the AI training context

In light of the Garante response, the EDPB Opinion provides some useful

clarifications. However, it does not provide operational guidance on how legitimate interest should be assessed in the context of AI training. As a result, organizations have to develop their own consistent and legally tenable approach in a rapidly evolving AI environment and often conflicting global regulations.

In the AI training context, understanding the purpose and nature of the training activity is essential for assessing whether legitimate interest can be used as a lawful basis. This assessment requires determining:

● Whether the organization has a real and specific legitimate interest in

developing and improving the AI system;

● Whether the use of personal data is genuinely necessary to achieve that objective; and

● Whether the overall processing remains proportionate, taking into account the impact on individuals.

In practical terms, this requires clearly defining what the AI system is intended to achieve and how its training will improve the service or functionality being offered. It also means assessing whether the resulting benefits meaningfully support the individuals whose data are being processed, or whether they primarily advance the organization’s commercial interests.

These considerations become clearer when we examine the AI development and deployment use cases and their likely risk classification under the AI Act. An AI-driven fraud detection tool may be trained on personal data to stop fraudulent transactions or identity thefts to strengthen overall security. For an AI-driven candidate matching tool, training may focus on improving the AI tool by using candidate CVs to improve accuracy in aligning applicants with an organizational hiring needs. In the healthcare sector, AI models may rely on patient medical data to predict illnesses based on symptoms, supporting earlier diagnosis, improving clinical decision-making. Each of these use cases involve different purposes, level of necessity, and types of benefits, factors that are central to determining whether legitimate interest offers an appropriate legal basis for the training activity.

The LIA must be able to justify the necessity for personal data in model improvement that cannot be replaced by non-personal data, how it weighs against individual privacy rights, and how transparency is ensured. The ability to demonstrate safeguards in place to protect personal data and how only data strictly necessary is used, further adds evidence of accountability and compliance, e.g., if an AI-driven diagnostic tool can use aggregated or anonymised health data instead of personally identifiable data to improve accuracy.

A further essential consideration is the processing context: whether the organization is developing the model using its own data for its own purposes or whether a third party provider is training its models and if individuals may reasonably expect their data being used in this manner. In practice, this may appear as routine product improvement, but from a legal perspective, it often constitutes a secondary purpose raising concerns around purpose limitation, reasonable expectations, transparency, and the shifting boundary between processor and controller roles, and who ultimately benefits from the improvement.

Conclusion

While the benefits to end users can support the controller’s position in relying on legitimate interest as a lawful basis for training AI models, questions on necessity, the scale of data required, transparency on use of data and data minimisation still apply, as recommended by the French DPA (CNIL) in its guidance on the use of legitimate interest for AI system development. Further, a LIA does not preclude the requirement of conducting a data privacy impact assessment (DPIA). Large scale AI training, particularly where novel technologies or special categories of data are involved will typically require a DPIA even when processing is based on legitimate interest.

At the heart of using legitimate interest as a lawful basis for training and developing of AI models is the promotion of ethical, safe, and responsible AI innovation in the EU. Developing an AI model for an ethically questionable purpose or in ways that involve unlawful processing of personal data or risk harm to individuals, would clearly fail the legitimate interest assessment. Organizations should take this into account when training or developing AI systems. Where the processing is intrusive, unexpected, or involves sensitive data, reliance on LIA is less likely to be appropriate.

Finally, where AI systems are subject to the EU AI Act, its risk-based framework provides additional guidance on data governance in training, validation, and testing, including, the processing of special categories of personal data. The GDPR and the AI Act form a complementary framework that support organizations in ensuring AI innovation is both lawful and trustworthy.

Links

EDPB opinion on AI models: GDPR principles support responsible AI | European Data Protection Board (https://www.edpb.europa.eu/news/news/2024/edpb-opinion-ai-models-gdpr-principles-support-responsible-ai_en)

CNIL Relying on the legal basis of legitimate interests to develop an AI system

Garante 2024 GEDI-OpenAI case Order

(https://www.cnil.fr/en/relying-legal-basis-legitimate-interests-develop-ai-system)

_____________________________________________________________

Collaborate with us!

As always, we appreciate you taking the time to read our blog post.

If you have news relevant to our global WAI community or expertise in AI and law, we invite you to contribute to the WAI Legal Insights Blog in 2016! To explore this opportunity, please contact WAI editors Silvia A. Carretta - WAI Chief Legal Officer (via LinkedIn or silvia@womeninai.co) or Dina Blikshteyn (dina@womeninai.co).

Silvia A. Carretta and Dina Blikshteyn

- Editors

The Question of Legitimate Interest As Lawful Basis for Training AI Models

The Question of Legitimate Interest As Lawful Basis for Training AI Models

Background and Purpose

Overview and regulatory position

Legitimate interest in the AI training context

Conclusion

Related Posts

Comments