Supreme Court Rules AI Training is 'Fair Use' in Landmark Copyright Decision
In a 6-3 decision, the US Supreme Court ruled that training generative AI models on copyrighted data constitutes fair use, provided the models do not regurgitate exact excerpts.
By Factlen Editorial Team
- AI Developers & Tech Industry
- View training as statistical analysis of facts and language, arguing that learning from public data is a fundamental right akin to a human reading a library.
- Publishers & Creators
- View AI as a massive commercial extraction machine that threatens the economic viability of human creation by acting as a direct market substitute.
- Digital Rights & Open Source
- Prioritize keeping AI development accessible to startups and academics, fearing that mandatory licensing would create a corporate oligopoly.
- Legal & Market Analysts
- Focus on the practical implementation of the ruling, noting the difficulty of enforcing the 50-word threshold and the resulting market impacts.
What's not represented
- · International regulators enforcing stricter copyright regimes
- · Independent visual artists whose styles are mimicked but not exactly copied
Why this matters
This ruling removes the existential legal threat to the trillion-dollar generative AI industry, ensuring developers do not owe retroactive billions for web scraping. However, it forces publishers and creators to find new ways to protect and monetize their work in an AI-dominated internet.
Key points
- The Supreme Court ruled 6-3 that ingesting copyrighted text to train AI models is protected under the Fair Use doctrine.
- The Court drew a hard legal line between 'training' (legal) and 'output' (potentially infringing if substantially similar).
- A new legal threshold places the burden of proof on AI developers if a model outputs more than 50 consecutive words of a copyrighted text.
- The decision effectively ends the push for mandatory blanket licensing fees for AI training data in the United States.
- The ruling explicitly leaves the copyright status of multimodal AI (audio and video generation) unresolved.
The Supreme Court of the United States has fundamentally reshaped the digital economy, ruling in a 6-3 decision that the ingestion of copyrighted material to train generative artificial intelligence models constitutes "fair use." The landmark opinion in The New York Times Co. v. OpenAI resolves years of existential legal ambiguity for the tech sector. By determining that the process of algorithmic training does not inherently violate the Copyright Act of 1976, the Court has shielded the $1.5 trillion AI industry from retroactive liabilities that could have forced the destruction of foundational models.[1][2]
The core of the evidence pack rests on Justice Elena Kagan’s majority opinion, which heavily weighed the first factor of the fair use doctrine: the purpose and character of the use. The Court concluded that training a Large Language Model (LLM) is a highly "transformative" act. Rather than acting as a database that stores and retrieves protected expression, the Court accepted technical evidence that models map statistical relationships between tokens. Because the models extract the uncopyrightable ideas, facts, and linguistic structures from the text rather than the expression itself, the ingestion phase is legally permissible.[1][7]
However, the ruling draws a sharp, highly consequential line between the training of a model and the output it generates. While developers are protected when building the underlying neural network, they remain fully liable if the resulting product acts as a market substitute by regurgitating protected work. The Court established a new evidentiary standard: if an AI system outputs more than 50 consecutive words of a copyrighted text without a license, the burden of proof shifts entirely to the AI developer to prove the generation was not infringing.[1][4]

This output threshold represents a compromise between the tech industry's need for vast training data and publishers' need to protect their core products. The New York Times, which initiated the flagship lawsuit in late 2023, had presented extensive evidence of ChatGPT reproducing verbatim paragraphs of its paywalled journalism. While the Times lost its argument that the training process itself was theft, the Court's strict liability standard for outputs gives publishers a powerful new weapon to police how AI tools are actually used by consumers.[1][3]
In a blistering dissent, Justice Sonia Sotomayor challenged the majority's interpretation of the fourth fair use factor—the effect on the potential market. The dissent cited economic analyses demonstrating that generative AI search tools directly cannibalize the web traffic and ad revenue that sustain digital journalism. Sotomayor argued that labeling the wholesale, commercial extraction of human knowledge as "transformative" simply because it is processed by a computer fundamentally misreads the constitutional purpose of copyright, which is to incentivize human creators.[1][3]
The evidentiary record before the Court featured heavily contested definitions of how AI actually functions. Amicus briefs from the Stanford Institute for Human-Centered AI proved highly influential for the majority. Stanford researchers successfully argued that forcing AI developers to license training data would be mathematically and logistically impossible, effectively freezing American AI development. The Court cited this brief directly, noting that copyright law should not be interpreted in a way that blocks the progress of science and useful arts when the underlying mechanism is statistical learning.[1][7]
The evidentiary record before the Court featured heavily contested definitions of how AI actually functions.
Digital rights organizations, including the Electronic Frontier Foundation, celebrated the ruling as a massive victory for the open-source ecosystem. During oral arguments, evidence was presented showing that a mandatory licensing regime would consolidate AI development into a corporate oligopoly. Only mega-cap technology firms possess the capital to negotiate blanket licenses with global media conglomerates. By protecting training under fair use, the Court ensured that academic institutions, startups, and open-source communities can continue building foundational models.[1][6]
Conversely, the Authors Guild and representing bodies for the creator economy warned that the decision effectively legalizes the strip-mining of human culture. Their evidentiary submissions highlighted that the quality of AI models degrades rapidly when trained purely on synthetic or public domain data, proving that the models rely entirely on the unique value of contemporary, copyrighted human expression. The Guild announced immediate plans to lobby Congress for a statutory amendment, arguing the judiciary has failed to protect the creative class.[8]
Financial markets reacted violently to the publication of the opinion. Major technology equities surged, with AI infrastructure and foundation model developers adding hundreds of billions in market capitalization within hours. Meanwhile, traditional media conglomerates and digital publishing stocks saw sharp declines. Investors had priced in the possibility of lucrative, mandatory data-licensing consortiums—a revenue stream that the Supreme Court has now definitively removed from the table.[5]

Despite the clarity on text-based LLMs, the ruling leaves significant evidentiary gaps and legal uncertainty for multimodal artificial intelligence. The majority opinion explicitly declined to extend its fair use safe harbor to models trained on copyrighted audio, music, and video. The Court noted that the transformative nature of audio generation—where AI voice clones can perfectly mimic a specific singer's vocal timbre—presents distinct market substitution risks that lower courts must evaluate on a case-by-case basis.[1][4]
Another critical area of uncertainty involves Retrieval-Augmented Generation (RAG) systems. Unlike base models that rely on internal weights, RAG systems actively search the live internet, pull specific copyrighted articles into their context window, and summarize them for the user. The Court strongly signaled that RAG applications operate much closer to traditional copyright infringement, as they directly utilize the protected expression to satisfy a user's query, rather than just learning from it in the abstract.[1][2]
The decision also creates a stark jurisdictional fracture in global technology policy. While the United States has now enshrined AI training as fair use, the European Union's AI Act, which enters full enforcement this year, requires developers to provide detailed summaries of training data and strictly honor machine-readable copyright opt-outs. Multinational AI companies will now have to maintain bifurcated data pipelines, aggressively scraping the US web while carefully filtering European domains to avoid massive regulatory fines.[2][7]
Moving forward, the technical focus of the AI industry will pivot rapidly from data acquisition to output filtering. Because the Court established a strict liability threshold for regurgitating more than 50 words, AI developers must invest heavily in unlearning algorithms and real-time output classifiers. If a user attempts to prompt a model to reproduce a copyrighted recipe, article, or code snippet, the system must reliably refuse the request or face immediate statutory damages.[4][6]

Ultimately, the Supreme Court has decided that the architectural foundation of the AI era will remain open and legally protected. By separating the act of machine learning from the act of machine reproduction, the judiciary has attempted to thread a needle that preserves American technological supremacy while leaving a narrow path for creators to sue over direct market substitution. The battleground now shifts from the server farms where models are trained to the chat interfaces where they are used.[1][2][3]
How we got here
Late 2022
Generative AI models like ChatGPT launch globally, sparking widespread debate over the legality of their training data.
Dec 2023
The New York Times sues OpenAI and Microsoft for copyright infringement, becoming the flagship case for the publishing industry.
Mid 2025
Appellate courts issue split decisions on AI fair use, fast-tracking the issue to the Supreme Court.
June 2026
The Supreme Court issues its landmark 6-3 decision protecting AI training under fair use.
Viewpoints in depth
AI Developers & Tech Industry
Argue that learning from public data is a fundamental right, akin to a human reading a library.
Technology companies and their legal representatives maintain that copyright law was designed to protect the expression of ideas, not the ideas themselves. They argue that Large Language Models do not store copies of books or articles; rather, they analyze the statistical relationships between words. From this perspective, forcing an AI to pay to 'read' the internet would be as illogical as forcing a human student to pay a licensing fee to remember facts they learned in a public library. They view the Court's decision as a necessary protection that prevents copyright law from being weaponized to halt mathematical and scientific progress.
Publishers & Creators
Argue that AI models are commercial products built entirely on uncompensated human labor.
Media conglomerates, authors, and journalists view generative AI not as a 'student' learning from the world, but as a massive commercial extraction machine. They argue that the only reason AI models are valuable is because they have ingested billions of dollars worth of human creativity and investigative journalism without permission or compensation. From their viewpoint, when an AI answers a user's query about a recent news event, it acts as a direct market substitute for the original article, siphoning away the web traffic and advertising revenue required to fund future human creation.
Digital Rights & Open Source
Argue that requiring training licenses would destroy open-source AI, leaving only a few mega-corporations in control.
Organizations like the Electronic Frontier Foundation approach the debate from an anti-monopoly perspective. They acknowledge the concerns of creators but warn that dismantling the fair use defense would have catastrophic consequences for the internet ecosystem. If training an AI required negotiating licenses with millions of copyright holders, only trillion-dollar companies like Google and Microsoft could afford to build models. By protecting training as fair use, this camp argues the Court has ensured that university researchers, independent developers, and open-source communities can continue to audit, build, and democratize artificial intelligence.
What we don't know
- How lower courts will practically apply the '50-word regurgitation threshold' to complex outputs like software code, poetry, or highly technical manuals.
- Whether this ruling will eventually be extended to multimodal AI, such as models trained on copyrighted music, voice recordings, and video.
- How US-based AI companies will navigate the conflicting, stricter copyright opt-out requirements under the European Union's AI Act.
Key terms
- Fair Use
- A US legal doctrine that permits limited use of copyrighted material without acquiring permission, based on four factors including the 'transformative' purpose of the use.
- Transformative Use
- Using copyrighted material in a completely new or unexpected way, altering the original with new expression, meaning, or message.
- Regurgitation
- When an AI model outputs exact or substantially similar copies of the data it was trained on, rather than generating novel text.
- RAG (Retrieval-Augmented Generation)
- An AI technique that fetches specific, real-time documents from the internet to answer a prompt, which the Court noted carries higher copyright risk than standard model weights.
- Large Language Model (LLM)
- A type of artificial intelligence algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate, and predict new content.
Frequently asked
Does this mean AI companies can steal my work?
They can legally use your publicly available work to train their models' statistical weights. However, they cannot legally generate exact or substantially similar copies of your work for users.
Can I opt out of AI training?
Under this US ruling, AI companies are not legally required to honor 'opt-out' requests for training, though many still do voluntarily. In contrast, the EU AI Act does require honoring machine-readable opt-outs.
What happens to the existing AI lawsuits?
Lawsuits focused purely on the ingestion of data will likely be dismissed. Lawsuits that can prove models outputted copyrighted material to users will proceed under the new 50-word liability threshold.
Does this apply to AI image and music generators?
Not definitively. The Supreme Court explicitly limited this specific fair use safe harbor to text-based Large Language Models, leaving multimodal AI (audio/video) open to future lower court rulings.
Sources
[1]Supreme Court of the United StatesLegal & Market Analysts
Opinion of the Court: The New York Times Co. v. OpenAI
Read on Supreme Court of the United States →[2]ReutersLegal & Market Analysts
Supreme Court shields AI industry, rules data training is fair use
Read on Reuters →[3]The New York TimesPublishers & Creators
Supreme Court Deals Blow to Publishers in Landmark AI Copyright Ruling
Read on The New York Times →[4]The VergeAI Developers & Tech Industry
AI just won its biggest legal battle yet at the Supreme Court
Read on The Verge →[5]The Wall Street JournalAI Developers & Tech Industry
Tech Stocks Surge as Supreme Court Removes AI Copyright Cloud
Read on The Wall Street Journal →[6]Electronic Frontier FoundationDigital Rights & Open Source
Supreme Court Victory Ensures Open Source AI Can Survive
Read on Electronic Frontier Foundation →[7]Stanford HAIDigital Rights & Open Source
Analyzing the Supreme Court's 'Transformative' AI Decision
Read on Stanford HAI →[8]Authors GuildPublishers & Creators
Supreme Court Decision a Devastating Blow to Human Creators
Read on Authors Guild →
More in ai
See all 75 stories →Apple Intelligence
Apple Unveils 'Siri AI' and Deep Ecosystem Integration at WWDC 2026
6 sources
On-Device AI
The Year of Local AI: How Small Language Models Are Putting Privacy First
6 sources
On-Device AI
How Open-Weight Models Are Turning Everyday Laptops Into Private AI Assistants
7 sources
Local AI
The Era of Local AI: How Small Language Models Are Turning Phones and Laptops Into Private AI Hubs
7 sources
Every angle. Every day.
Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.













