Formal VerificationEvidence PackJun 8, 2026, 3:02 AM· 5 min read· #3 of 3 in science

AI Systems Solve Decades-Old Erdős Math Problems, Sparking a New Era of Formal Verification

Back-to-back breakthroughs by Google DeepMind and OpenAI demonstrate that AI can now generate novel, research-level mathematical discoveries when paired with formal proof assistants.

By Factlen Editorial Team

Share this story

AI-Assisted Mathematicians 45%Mathematical Traditionalists 30%Institutional Researchers 25%

AI-Assisted Mathematicians: View AI and formal verification as a revolutionary co-pilot that accelerates discovery and ensures rigorous foundations.
Mathematical Traditionalists: Emphasize that mathematical proofs must provide human insight and conceptual understanding, not just machine-verified correctness.
Institutional Researchers: Focus on measuring and integrating these tools systematically into active research environments to quantify their real-world impact.

What's not represented

· Educators adapting math curricula for an AI-native generation
· Pure mathematicians working in fields resistant to formalization

Why this matters

For decades, advanced mathematics has been constrained by the limits of human cognition and the slow pace of manual peer review. The integration of AI with formal proof assistants is removing these bottlenecks, promising to accelerate discoveries not just in abstract geometry, but in the software and hardware verification that secures our digital infrastructure.

Key points

OpenAI generated a counterexample to the 80-year-old Erdős unit distance problem.
Google DeepMind's AlphaProof Nexus solved nine open Erdős problems autonomously.
Breakthroughs rely on pairing generative AI with the Lean formal proof assistant.
Lean acts as a logical referee, preventing AI models from hallucinating false proofs.
DARPA awarded $2.6 million to study AI's impact on active mathematical research.
Some mathematicians worry AI proofs lack the conceptual insight of human reasoning.

Open Erdős problems solved by AlphaProof Nexus

80 years

Age of the unit distance puzzle solved by OpenAI

$2.6M

DARPA grant to study AI in math research

250,000+

Theorems formalized in Lean's mathlib4

For decades, artificial intelligence has served mathematics primarily as a high-speed calculator, crunching numbers while humans provided the structural insight. In late May 2026, that dynamic fundamentally shifted. Two of the world's leading AI laboratories announced back-to-back breakthroughs in which neural networks produced genuinely novel, research-level mathematical discoveries.[1][2]

The first milestone arrived when OpenAI researchers revealed that their model had solved the "unit distance problem," a notorious combinatorial geometry puzzle proposed nearly 80 years ago by the legendary Hungarian mathematician Paul Erdős. Rather than validating the long-standing conjecture, the AI generated a complex counterexample—constructing a point arrangement with more unit-distance pairs than the accepted upper bound allowed.[2][3]

Just a day later, Google DeepMind unveiled AlphaProof Nexus, an AI system that autonomously solved nine open Erdős problems. These problems, spanning graph theory and combinatorics, included two that had remained unsolved for 56 years. DeepMind achieved this at a compute cost of only a few hundred dollars per problem, while simultaneously proving 44 open conjectures from the Online Encyclopedia of Integer Sequences.[1]

In May 2026, two major AI laboratories announced breakthroughs on long-standing open problems.

The engine driving this sudden acceleration is not just a larger language model, but the marriage of generative AI with formal proof assistants. Large language models (LLMs) are highly capable of proposing creative mathematical steps, but they are prone to logical hallucinations. To solve this, researchers paired them with Lean, a rigorous programming language and theorem prover.[1][5]

Lean acts as an uncompromising referee. When the LLM generates a potential proof step, Lean checks its logical validity against foundational axioms. If the step is flawed, the system rejects it, forcing the AI to try a new path. This iterative feedback loop—propose, verify, repeat—allows the AI to explore vast mathematical spaces without drifting into error.[5][6]

The infrastructure supporting this neuro-symbolic approach has been quietly building for years. Lean’s collaborative mathematical library, mathlib4, expanded rapidly through community effort, surpassing 250,000 formalized theorems and 120,000 definitions by late 2025. This massive repository of machine-readable mathematics provides the training ground and the building blocks for systems like AlphaProof Nexus.[5]

By pairing generative AI with formal proof assistants, researchers eliminate logical hallucinations.

The infrastructure supporting this neuro-symbolic approach has been quietly building for years.

The transition of AI from a parlor trick to a foundational research tool is now attracting major institutional backing. In May 2026, the Defense Advanced Research Projects Agency (DARPA) awarded a $2.6 million grant to researchers at UC Irvine and USC. The three-year project is designed to measure exactly how AI tools accelerate progress when deployed in active, real-world mathematical research environments.[4]

Rather than testing AI on closed problem sets with known answers, the DARPA-funded initiative convenes expert mathematicians to tackle unsolved problems in number theory and partial differential equations alongside AI co-pilots. The goal is to quantify AI's impact on genuine mathematical discovery, moving beyond simple problem-solving accuracy to evaluate its role in frontier research.[4]

Despite the undeniable results, the rise of AI-generated proofs has sparked a fierce debate over the philosophical purpose of mathematics. The core tension lies in the difference between knowing that a theorem is true and understanding why it is true. Proofs that are verifiable in Lean are not always parse-able by human minds.[8]

David Bessis, a mathematician and science writer, has voiced concerns that autoformalization and AI-generated proofs could reduce the conceptual benefit the mathematics community gains from new discoveries. If an AI solves an 80-year-old problem using a convoluted, million-step logical derivation that no human can follow, the field gains a factual answer but no new intuition or theoretical framework.[8]

The community-driven mathlib library has rapidly expanded, providing the foundational data required for AI-assisted discovery.

Proponents counter that this is a temporary bottleneck. Researchers are actively developing bidirectional translation tools to convert dense Lean code back into natural-language sketches and lemmas that humans can engage with. Furthermore, the Lean Focused Research Organization (FRO), established in 2023, is explicitly tasked with improving Lean's usability, documentation, and proof automation to bridge the gap between human mathematicians and machine verification.[6][8]

The implications extend far beyond abstract geometry. The ability to formally verify complex logic at scale has immediate applications in software and hardware verification, ensuring that critical systems—from cloud infrastructure to medical devices—operate without catastrophic bugs. The same neuro-symbolic architecture solving Erdős problems is already being adapted to verify AWS security policies and optimize numerical algorithms.[5][6]

The cultural shift is already visible in how the next generation of mathematicians is being trained. In early 2026, the International Centre for Mathematical Sciences (ICMS) hosted a dedicated residency on AI and mathematics, training students in the emerging pipeline of conjecture generation, autoformalization, and automated theorem proving. Participants explored how modern neuro-symbolic systems blend deep learning with formal reasoning, signaling that fluency in AI tools is becoming as essential as fluency in calculus.[7]

Mathematics is undergoing a structural phase transition. The May 2026 breakthroughs demonstrate that AI is no longer just an assistant for tedious calculations; it is an autonomous engine for discovery. As the mathematics community navigates the tension between machine-verified truth and human comprehension, the discipline is expanding into territories that neither humans nor machines could explore alone.[2][3][5]

How we got here

2013
Leonardo de Moura creates the Lean programming language and theorem prover.
July 2023
The Lean Focused Research Organization (FRO) is formed to scale formal verification.
December 2025
Lean's mathlib4 library surpasses 250,000 formalized theorems.
May 2026
OpenAI and Google DeepMind announce back-to-back breakthroughs solving multiple open Erdős problems.

Viewpoints in depth

AI-Assisted Mathematicians

Advocates who believe human intuition paired with machine-verified logic is the ultimate frontier.

This camp argues that the historical bottleneck in mathematics has been the slow, error-prone nature of manual peer review. By utilizing formal proof assistants like Lean, researchers can prevent the hallucinations that plague standard language models. They view AI not as a replacement for human mathematicians, but as an untiring co-pilot that can safely explore vast mathematical spaces, allowing humans to focus on high-level architectural ideas rather than tedious logical derivations.

Mathematical Traditionalists

Critics who argue that mathematical proofs must provide conceptual understanding, not just binary verification.

Traditionalists emphasize that the purpose of a mathematical proof is not merely to establish that a statement is true, but to explain why it is true. They warn against the proliferation of "AI slop"—million-line proofs that satisfy a computer checker but remain entirely unreadable to humans. For this camp, a verified theorem that offers no new theoretical framework or intuition is mathematically hollow, potentially degrading the culture of insight that drives the discipline forward.

Institutional Researchers

Pragmatists focused on measuring and integrating AI tools systematically into active research environments.

Less concerned with the philosophical debates, this group is focused on empirical measurement. Organizations like DARPA and university research teams are actively studying how AI co-pilots affect the productivity of working mathematicians. Their goal is to build frameworks that quantify AI's impact on unsolved questions, ensuring that these powerful new tools are seamlessly integrated into the daily workflows of frontier scientific research.

What we don't know

Whether bidirectional translation tools will successfully make all AI-generated Lean proofs readable to humans.
How quickly these neuro-symbolic AI systems can be adapted to solve problems in other highly abstract fields, such as category theory.
The long-term impact of AI co-pilots on the educational pipeline for early-career mathematicians.

Key terms

Formal Verification: The use of software to check that a mathematical proof is logically flawless, step by step, eliminating human error.
Autoformalization: The process of using AI to translate natural-language mathematical proofs into machine-readable code.
Combinatorics: A branch of mathematics focused on counting, arranging, and finding patterns in complex sets of structures.
Counterexample: A specific case or arrangement that disproves a general mathematical statement or conjecture.

Frequently asked

What is an Erdős problem?

Problems posed by the prolific mathematician Paul Erdős, often in combinatorics and graph theory, known for being easy to state but incredibly difficult to solve.

How does Lean prevent AI hallucinations?

Lean acts as a strict logical referee. It checks every step of an AI's proposed proof against foundational mathematical axioms, rejecting any step that contains a logical flaw.

Did AI prove the unit distance conjecture?

No, OpenAI's model generated a counterexample, effectively breaking the old upper bound and disproving the long-standing assumption.

Why are some AI proofs hard for humans to read?

AI systems often generate highly convoluted, non-intuitive logical steps that satisfy the computer checker but lack the narrative structure and conceptual insight of human proofs.

Sources

[1]The Rundown AIAI-Assisted Mathematicians
Google tops OpenAI's math breakthrough — 9 to 1
Read on The Rundown AI →
[2]Tech NewsInstitutional Researchers
AI makes a math breakthrough & AI boom, bubbles, and backlash
Read on Tech News →
[3]Morocco World NewsInstitutional Researchers
AI Solves 80-Year-Old Math Puzzle That Stumped Generations of Researchers
Read on Morocco World News →
[4]UC Irvine NewsInstitutional Researchers
UC Irvine, USC receive $2.6 million DARPA grant for AI to drive math breakthroughs
Read on UC Irvine News →
[5]arXivAI-Assisted Mathematicians
AI for Mathematics: Progress, Challenges, and Prospects
Read on arXiv →
[6]Lean FROAI-Assisted Mathematicians
About — Lean Lang
Read on Lean FRO →
[7]ICMSInstitutional Researchers
AI × Mathematics 2026
Read on ICMS →
[8]David Bessis SubstackMathematical Traditionalists
AI-led solutions of Erdős problems spark debate over the future of mathematics
Read on David Bessis Substack →

Up next

Forensic Genealogy

How Forensic Genetic Genealogy is Emptying the Nation's Cold Case Files

Advances in forensic investigative genetic genealogy (FIGG) are clearing decades-old cold cases by combining whole-genome sequencing with consumer DNA databases. As the technology proves effective on highly degraded samples, new state and federal initiatives are mobilizing to fund the costly process nationwide.

Every angle. Every day.

Get science stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse science