Factlen ExplainerMachine UnlearningExplainerJun 18, 2026, 2:40 AM· 6 min read· #3 of 3 in ai

How AI Learns to Forget: The Rise of 'Machine Unlearning'

As privacy laws demand the right to be forgotten, researchers are developing techniques to make AI models surgically erase specific knowledge without the massive cost of retraining from scratch.

By Factlen Editorial Team

Privacy Advocates & Regulators 35%AI Developers & Engineers 35%Security Researchers & Skeptics 30%
Privacy Advocates & Regulators
Demand verifiable deletion of user data to comply with laws like the GDPR, ensuring data is truly removed and not just hidden.
AI Developers & Engineers
Focus on the computational efficiency of unlearning techniques, viewing them as essential alternatives to the massive costs of retraining.
Security Researchers & Skeptics
Warn that current unlearning methods often just mask data, leaving it vulnerable to extraction and reversibility attacks.

What's not represented

  • · End-users whose personal data was memorized
  • · Copyright holders seeking compensation for ingested works

Why this matters

If AI companies cannot efficiently remove copyrighted material, biased data, or personal information from their models, they face massive regulatory fines and public distrust. Machine unlearning offers a technical bridge between powerful AI capabilities and strict human privacy rights.

Key points

  • AI models absorb massive amounts of data, making it difficult to remove sensitive or copyrighted information once trained.
  • Privacy laws like the GDPR require companies to honor the 'right to be forgotten,' necessitating new data deletion methods.
  • Retraining an AI model from scratch to remove a single data point is financially and environmentally unfeasible.
  • Machine unlearning allows developers to surgically remove the influence of specific data without destroying the model's capabilities.
  • Recent research reveals that some unlearning methods only suppress data, leaving models vulnerable to 'reversibility attacks.'
100%
Compute required for traditional retraining
< 1%
Compute required for targeted unlearning
2015
First major statistical unlearning paper

Modern artificial intelligence models, particularly Large Language Models (LLMs), are essentially vast digital sponges. During their initial training phases, they absorb trillions of words from across the internet, internalizing facts, linguistic structures, and complex reasoning patterns. However, this indiscriminate absorption comes with a significant drawback: they also ingest things they shouldn't. A model might memorize a user's private medical records, copyrighted novels, toxic biases, or dangerous instructions on how to synthesize illicit materials. Once this information is baked into the billions of parameters that make up the neural network, it becomes incredibly difficult to isolate. It is akin to trying to remove a single drop of red dye after it has been thoroughly mixed into a large vat of paint.[1][2]

This technical reality is increasingly colliding with strict legal frameworks. Under privacy regulations like the European Union's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), individuals possess the "right to be forgotten." If a user requests that a technology company delete their personal data, the company is legally obligated to comply. However, simply deleting the original text file from a server database is no longer sufficient. If an AI model has already trained on that file, the model itself retains hidden patterns and traces of that sensitive information, meaning the data has not truly been removed from the system.[1][3]

Historically, the only guaranteed way to completely remove a specific piece of data from an AI system was the brute-force approach: deleting the offending data from the training set and retraining the entire model from scratch. For modern foundation models, this is a logistical and financial nightmare. Training a state-of-the-art LLM requires tens of thousands of specialized GPUs running continuously for months, consuming millions of dollars in electricity and generating a massive carbon footprint. Forcing a company to execute a full retraining cycle every time a single user invokes their right to be forgotten is fundamentally incompatible with the rapid pace of AI development and deployment.[4][7]

Retraining a foundation model from scratch is financially unfeasible for routine data deletion requests.
Retraining a foundation model from scratch is financially unfeasible for routine data deletion requests.

To solve this bottleneck, researchers have pioneered an emergent subfield of artificial intelligence known as "machine unlearning." The goal of machine unlearning is to surgically remove the influence of a specific subset of training data from a fully trained model, without having to rebuild the model entirely. If successful, the updated model should behave exactly as if the removed data point had never been included in the original training run. This allows AI developers to maintain the model's overall intelligence, predictive power, and linguistic capabilities while efficiently excising the problematic information, saving immense amounts of time and computational resources.[1][2][4]

One of the earliest and most intuitive methods for achieving this is the SISA framework, which stands for Sharded, Isolated, Sliced, and Aggregated learning. Instead of training one massive model on a single monolithic dataset, developers divide the training data into multiple distinct "shards." Each shard is used to train an independent sub-model in total isolation. When a user queries the AI, the system aggregates the predictions from all the sub-models to deliver a final answer. If a specific piece of data needs to be deleted, the engineers only need to identify which shard contained that data, remove it, and retrain that one specific sub-model. The rest of the system remains untouched, drastically reducing the computational burden.[1][4]

The SISA framework allows developers to delete data by only retraining a small, isolated fraction of the overall system.
The SISA framework allows developers to delete data by only retraining a small, isolated fraction of the overall system.
One of the earliest and most intuitive methods for achieving this is the SISA framework, which stands for Sharded, Isolated, Sliced, and Aggregated learning.

While the SISA framework works well for simpler machine learning tasks, it scales poorly to massive, deeply interconnected Large Language Models. For these behemoths, researchers employ a technique known as "deep unlearning" or gradient ascent. In standard machine learning, a model uses a mathematical process called gradient descent to minimize its error and "learn" the data. Deep unlearning essentially runs this process in reverse. Engineers compile a "forget set" containing the data to be erased, and a "retain set" containing similar data that the model should keep. By ascending the loss gradient for the forget set while simultaneously minimizing the loss for the retain set, the model is mathematically forced to degrade its performance and "forget" the targeted information without suffering catastrophic brain damage to its broader capabilities.[4][7]

The applications of machine unlearning extend far beyond simple privacy compliance. Researchers at institutions like Stanford University view it as a critical mechanism for "corrective unlearning." If an AI model is discovered to harbor deep-seated racial or gender biases acquired from its training data, unlearning algorithms can be deployed to specifically target and neutralize those biased connections. Similarly, if a model has ingested stale or factually incorrect information, or if it demonstrates dangerous capabilities like generating phishing code, unlearning provides a post-training risk mitigation tool. It allows developers to continuously edit and refine the safety of an AI system long after the initial training run has concluded.[2][3]

However, as the field of machine unlearning matures, a critical debate has emerged regarding the true nature of this digital forgetting. Is the targeted information genuinely erased from the neural network, or is it merely being suppressed and hidden from view? Recent studies presented at premier AI conferences, including research from Carnegie Mellon University and the International Conference on Learning Representations (ICLR), suggest that current unlearning techniques may be creating an illusion of deletion. While the unlearned models appear to have forgotten the data when tested with standard prompts, the underlying knowledge often remains dormant within the model's weights.[5][6]

This vulnerability is exposed through what researchers call "reversibility attacks" or "relearning attacks." In these scenarios, security analysts take an unlearned model and subject it to a very small amount of fine-tuning using benign, publicly available data that is only loosely related to the deleted information. Shockingly, this minimal intervention is often enough to "jog" the AI's memory, causing the supposedly erased data to resurface. The model rapidly recovers its ability to generate the sensitive or copyrighted information, proving that the unlearning algorithm merely built a fragile barrier around the knowledge rather than extracting it at the root.[5][7]

Recent research shows that some unlearning methods only suppress data, which can be recovered through 'reversibility attacks.'
Recent research shows that some unlearning methods only suppress data, which can be recovered through 'reversibility attacks.'

This phenomenon of reversibility highlights a massive evaluation gap in the current AI safety landscape. Traditional metrics for measuring unlearning—such as checking if the model's accuracy on the forget set drops to zero—are fundamentally misleading. Researchers are now developing advanced representation-level diagnostic toolkits to measure the actual drift in the model's internal architecture. These tools aim to distinguish between "catastrophic forgetting" (where the model breaks entirely), "reversible suppression" (where the data is hidden but recoverable), and the holy grail of "irreversible, non-catastrophic forgetting" (where the data is truly gone but the model remains functional).[6]

Despite these ongoing challenges, machine unlearning represents a vital and optimistic frontier in artificial intelligence. It marks a fundamental shift away from viewing AI models as static, immutable black boxes, and toward treating them as dynamic, editable systems capable of continuous ethical refinement. As researchers close the gap between approximate suppression and guaranteed deletion, machine unlearning will become the foundational technology that allows society to harness the immense power of foundation models while uncompromisingly protecting individual privacy, enforcing copyright, and ensuring algorithmic fairness.[1][2][7]

How we got here

  1. 2015

    Researchers publish the first major paper on statistical query unlearning, laying the groundwork for the field.

  2. 2018

    The European Union implements the GDPR, legally enshrining the 'right to be forgotten' for digital data.

  3. 2019

    The SISA framework is introduced, offering a practical way to unlearn data by dividing training sets into isolated shards.

  4. 2024-2025

    The focus shifts to 'deep unlearning' for massive Large Language Models, alongside the discovery of 'reversibility attacks.'

Viewpoints in depth

Privacy Advocates & Regulators

Focus on the legal necessity of unlearning to satisfy the 'right to be forgotten'.

For privacy advocates and international regulators, machine unlearning is not just a neat technical trick—it is a strict legal requirement. Under frameworks like the GDPR, if an individual requests their data be deleted, a company cannot simply claim that extracting it from an AI model is 'too hard.' Regulators argue that if an AI system cannot prove it has genuinely forgotten a user's data, that system is operating illegally. They push for stringent auditing standards to ensure that unlearning algorithms provide verifiable, permanent deletion rather than just surface-level suppression.

AI Developers & Engineers

Focus on the computational efficiency of unlearning compared to retraining.

From an engineering perspective, machine unlearning is an economic necessity. Training a modern foundation model costs tens of millions of dollars and requires months of continuous supercomputer operation. If developers had to execute a full retraining cycle every time a user submitted a data deletion request, the entire generative AI industry would grind to a halt. Engineers view techniques like the SISA framework and gradient ascent as the only sustainable path forward, allowing them to rapidly patch models, remove toxic biases, and comply with privacy laws for pennies on the dollar.

Security Researchers & Skeptics

Focus on the vulnerabilities of current unlearning methods and the 'illusion of forgetting.'

Security analysts and academic researchers remain highly skeptical of current unlearning claims. Through rigorous testing, they have demonstrated that many 'unlearned' models have simply built a fragile guardrail around the deleted data. By executing 'reversibility attacks'—where the model is lightly fine-tuned on loosely related public data—researchers can easily jog the AI's memory and force it to regurgitate the supposedly erased information. This camp argues that until the industry can achieve 'irreversible, non-catastrophic forgetting' at the representation level, machine unlearning provides a false sense of security.

What we don't know

  • It remains unclear if current 'approximate unlearning' techniques legally satisfy the strict data deletion requirements of the GDPR.
  • Researchers do not yet know how to achieve guaranteed, irreversible unlearning in massive Large Language Models without degrading their overall intelligence.
  • The long-term impact of repeated, continuous unlearning cycles on a single foundation model's stability is still being studied.

Key terms

Machine Unlearning
The process of selectively removing the influence of specific training data from an AI model without retraining it from scratch.
Right to be Forgotten
A legal principle, prominent in the GDPR, granting individuals the right to have their personal data deleted by organizations.
SISA Framework
A method of training AI in isolated 'shards' so that data can be deleted by only retraining a small fraction of the system.
Gradient Ascent
A mathematical technique used in deep unlearning that runs the learning process in reverse to make a model 'forget' specific patterns.
Reversibility Attack
A security test where researchers use minimal fine-tuning to 'jog' an AI's memory, proving that unlearned data was only hidden, not erased.

Frequently asked

Can't companies just delete the original data file?

No. While deleting the file removes it from the database, the AI model has already internalized the patterns and information from that file into its neural network weights.

Does unlearning damage the AI's overall intelligence?

It can, which is why researchers use a 'retain set' during the unlearning process to anchor the model's general knowledge while it forgets the targeted data.

Is machine unlearning legally recognized as true deletion?

It is currently a legal gray area. Regulators are still determining if 'approximate forgetting' satisfies the strict legal requirements of laws like the GDPR.

Sources

Source coverage

7 outlets

3 viewpoints surfaced

Privacy Advocates & Regulators 35%AI Developers & Engineers 35%Security Researchers & Skeptics 30%
  1. [1]MediumPrivacy Advocates & Regulators

    A Journey into Machine Unlearning: Transforming Data Privacy in Machine Learning

    Read on Medium
  2. [2]Stanford AI LabAI Developers & Engineers

    Machine Unlearning in 2024

    Read on Stanford AI Lab
  3. [3]Pecan AIPrivacy Advocates & Regulators

    The Importance of Machine Unlearning in Data Privacy

    Read on Pecan AI
  4. [4]Probably PrivateAI Developers & Engineers

    How does machine unlearning work?

    Read on Probably Private
  5. [5]Carnegie Mellon UniversitySecurity Researchers & Skeptics

    Machine unlearning in LLMs is susceptible to relearning attacks

    Read on Carnegie Mellon University
  6. [6]OpenReviewSecurity Researchers & Skeptics

    Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

    Read on OpenReview
  7. [7]Factlen Editorial TeamSecurity Researchers & Skeptics

    Synthesis by Factlen editorial team

    Read on Factlen Editorial Team
Stay informed

Every angle. Every day.

Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.