US Appeals Court Establishes the 'Two-Phase' AI Copyright Rule in Landmark Fair Use Decision
A federal appeals court has ruled that training AI models on copyrighted data constitutes fair use, but generating outputs that closely mimic the original works is copyright infringement.
By Factlen Editorial Team
- AI Developers & Technologists
- Argue that machine learning is a fundamental computational right akin to human reading, requiring fair use protection to exist.
- Publishers & Creators
- Argue that AI models are commercial products built on uncompensated labor that threaten to replace the original creators.
- Legal Scholars
- Focus on the mechanics of applying 20th-century copyright law to statistical models, emphasizing the extraction/expression divide.
What's not represented
- · Open-source AI developers who cannot afford regurgitation filters
- · International regulators managing cross-border data flows
Why this matters
This ruling averts the immediate shutdown of the US generative AI industry while giving publishers and artists a legal mechanism to sue when models regurgitate their specific works, fundamentally rewriting the economics of digital content.
Key points
- Appeals court establishes a 'Two-Phase' doctrine for AI copyright.
- Ingesting copyrighted data to train models is ruled as transformative fair use.
- Generating outputs that closely mimic specific training data is ruled as copyright infringement.
- AI developers must now implement strict 'regurgitation filters' to avoid massive liability.
- The ruling leaves the legality of mimicking an artist's general 'style' largely unresolved.
The legal cloud hanging over the $2.5 trillion generative AI industry has finally broken. In a landmark decision on Wednesday, the Second Circuit Court of Appeals issued its long-awaited ruling on a consolidated batch of copyright lawsuits against major AI developers, establishing a new "Two-Phase" doctrine for artificial intelligence.[1][2]
The ruling fundamentally splits the mechanics of generative AI into two distinct legal categories: the act of training the model, and the act of generating outputs. By treating these as separate legal events, the court attempted to thread the needle between technological innovation and creator compensation.[6]
For the tech industry, the court delivered a massive victory on the first phase. The judges ruled 2-1 that ingesting copyrighted works to train a large language model or image generator constitutes "transformative fair use" under U.S. law, meaning developers do not need to license the data they scrape from the public web.[1][5]
The court reasoned that AI models do not store copies of the books, articles, or images they ingest. Instead, they extract statistical correlations and uncopyrightable facts—a process digital rights groups have likened to a human student reading a library of books to learn how to write.[3][5]

"To declare the extraction of statistical patterns as copyright infringement would effectively grant authors a monopoly over the mathematical properties of language itself," the majority opinion stated, noting that copyright protects expression, not underlying ideas or data relationships.[2]
However, the court delivered an equally forceful victory to publishers and creators on the second phase: the outputs. The ruling explicitly strips fair use protection from any AI-generated output that exhibits "substantial similarity" to a specific ingested work.[1][4]
This means that if a user prompts an AI to generate an article that closely mimics a specific investigative report, or an image that replicates the exact composition of a copyrighted photograph, the AI developer can be held directly liable for copyright infringement.[1][6]
The U.S. Copyright Office had previously signaled support for this bifurcated approach in its early 2026 policy reports, noting that the economic harm to creators occurs at the point of market substitution, not at the point of computational analysis.[4]
The most contested aspect of the ruling centered on the fourth factor of the fair use test: the effect on the potential market for the original work. This is where the legal mechanics of AI ingestion faced their toughest scrutiny.[3]

The most contested aspect of the ruling centered on the fourth factor of the fair use test: the effect on the potential market for the original work.
Plaintiffs representing authors and media organizations argued that the mere existence of a model trained on their data creates a competing product that destroys their market value, regardless of whether the model outputs exact copies.[1]
The court disagreed with this broad interpretation, citing the Supreme Court's precedent in the Google Books case. The Second Circuit ruled that a tool that makes information more accessible or creates a new statistical utility does not inherently usurp the market for the underlying expressive works—unless it actively regurgitates them.[2][3]
By shifting the liability entirely to the output phase, the court has effectively mandated a massive technical shift for AI developers, turning a legal standard into an engineering requirement.[6]
Companies will now be legally required to implement robust "regurgitation filters" and similarity-detection systems to ensure their models cannot output memorized training data, even when deliberately provoked by users.[1][2]
If a model is found to routinely bypass these filters and output copyrighted material, the developer could face statutory damages of up to $150,000 per infringed work—a penalty that could easily bankrupt even the most well-funded AI labs if applied at scale.[4][6]

While the ruling provides clarity on direct memorization, it leaves a massive gray area regarding "style." The boundary between a model generating a substantially similar copy and a model generating a new work in the style of a specific artist remains undefined.[3]
The dissenting judge argued that the majority's framework fails to protect visual artists whose distinct aesthetic styles are perfectly mimicked by AI. Copyright law traditionally does not protect style, but the unprecedented scale of AI mimicry is testing the limits of that doctrine.[1][3]
The Second Circuit's ruling places the United States on a distinct path compared to other global jurisdictions, setting up a fractured international landscape for AI development.[6]
The European Union's AI Act relies heavily on an "opt-out" framework, requiring developers to respect machine-readable flags that prohibit training. Meanwhile, Japan has adopted a highly permissive stance, explicitly legalizing almost all data ingestion for machine learning to boost its domestic tech sector.[4][6]
Because this ruling comes from a federal appeals court and creates a novel legal framework for a multi-trillion-dollar industry, legal analysts universally expect the case to be appealed to the U.S. Supreme Court.[2]
Until the Supreme Court weighs in, likely in 2027, the "Two-Phase" doctrine will serve as the operational law of the land, forcing AI companies to rapidly re-engineer their safety filters while allowing them to continue scraping the public web.[1][6]

How we got here
Late 2023
Major publishers and authors file a wave of class-action copyright lawsuits against AI labs.
Mid 2024
Discovery phases reveal internal communications regarding data scraping practices.
Early 2025
District courts issue mixed rulings, prompting fast-tracked appeals.
June 2026
The Second Circuit issues the landmark 'Two-Phase' doctrine ruling.
Viewpoints in depth
AI Developers & Technologists
Argue that machine learning is a fundamental computational right akin to human reading.
This camp views the ingestion of data not as copying, but as statistical analysis. They argue that if a human is allowed to read a copyrighted book to learn how to write better, a machine should be allowed to process that same book to adjust its neural weights. They view the 'fair use' ruling on training as essential to maintaining US technological leadership, warning that requiring licenses for billions of data points would make AI development impossible for anyone except the largest tech monopolies.
Publishers & Creators
Argue that AI models are commercial products built on uncompensated labor that threaten to replace the original creators.
Creators argue that the 'human reading' analogy is fundamentally flawed because AI models operate at an industrial scale to create direct market substitutes for the very people whose work they ingested. While they welcome the strict liability on regurgitated outputs, many in this camp feel the ruling falls short by allowing tech companies to build multi-billion-dollar commercial products using their intellectual property without offering compensation or an opt-out mechanism.
Legal Scholars
Focus on the mechanics of applying 20th-century copyright law to statistical models.
Legal analysts view this ruling as a pragmatic, if imperfect, attempt to map the 1976 Copyright Act onto 21st-century technology. They emphasize the 'extraction vs. expression' divide, noting that copyright was designed to protect the specific expression of an idea, not the underlying facts, styles, or statistical correlations. However, scholars warn that the burden of proving 'substantial similarity' in AI outputs will lead to years of messy, expensive litigation over exactly what percentage of a generated image or text constitutes infringement.
What we don't know
- Whether the U.S. Supreme Court will uphold, modify, or strike down this framework.
- Exactly what mathematical threshold of similarity will trigger infringement liability in future jury trials.
- How open-source AI models, which lack centralized output filters, will be regulated under this standard.
Key terms
- Fair Use
- A US legal doctrine permitting limited use of copyrighted material without acquiring permission, based on a four-factor test.
- Transformative Use
- A use that adds something new, with a further purpose or different character, rather than substituting for the original use.
- Substantial Similarity
- The legal standard used to determine whether a defendant has copied enough of a plaintiff's expression to constitute infringement.
- Regurgitation
- In AI, when a model outputs an exact or near-exact copy of a specific piece of data it was trained on.
Frequently asked
Does this mean AI companies have to pay for training data?
No. The court ruled that the act of training itself is fair use and does not require licensing or payment, provided the data was publicly accessible.
Can I sue if an AI copies my specific artwork?
Yes. If the AI generates an output that is 'substantially similar' to your specific copyrighted work, the developer can be held liable for infringement.
Is it illegal for an AI to copy an artist's general style?
Currently, no. Copyright law protects specific expressions, not general styles or concepts, though this remains a heavily debated gray area that the court left largely unresolved.
Is this the final word on AI copyright?
Unlikely. Legal experts expect this decision to be appealed to the U.S. Supreme Court for a final, nationwide ruling.
Sources
[1]ReutersPublishers & Creators
U.S. appeals court rules AI training is fair use, but outputs face strict copyright tests
Read on Reuters →[2]Bloomberg LawAI Developers & Technologists
Second Circuit's 'Two-Phase' AI Ruling Reshapes Digital Copyright
Read on Bloomberg Law →[3]Stanford Center for Internet and SocietyLegal Scholars
Extraction vs. Expression: The Legal Mechanics of AI Ingestion
Read on Stanford Center for Internet and Society →[4]U.S. Copyright OfficePublishers & Creators
Report on Copyright and Artificial Intelligence: Part III - Generative Models
Read on U.S. Copyright Office →[5]Electronic Frontier FoundationAI Developers & Technologists
The Second Circuit Protects the 'Right to Learn' for AI Models
Read on Electronic Frontier Foundation →[6]Factlen Editorial TeamLegal Scholars
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.








