AI Theorem ProvingTrend AnalysisJun 20, 2026, 12:01 AM· 5 min read

How AI and a Strict Coding Language Are Unlocking Decades-Old Math Mysteries

Artificial intelligence systems are crossing a historic threshold in mathematics, solving 80-year-old conjectures by pairing neural creativity with the unforgiving logic of formal verification.

By Factlen Editorial Team

Share this story

Working Mathematicians 40%AI Researchers 35%Formalization Advocates 25%

Working Mathematicians: View AI as a powerful co-pilot for tedious lemmas, but emphasize the continued need for human intuition in conjecture generation.
AI Researchers: Focus on scaling neuro-symbolic systems and using formal verifiers to create self-improving data flywheels.
Formalization Advocates: Believe the systematic translation of mathematics into machine-readable code (Lean) is the true foundation of this revolution.

What's not represented

· Mathematics Educators
· Pure Intuitionists

Why this matters

By pairing the creative pattern-recognition of AI with the strict logic of formal coding languages, mathematicians are automating the tedious aspects of proof verification. This accelerates the pace of scientific discovery, lowers the barrier to entry for students, and promises to unlock solutions to centuries-old problems that underpin modern cryptography, physics, and computer science.

Key points

Artificial intelligence systems have advanced from solving high school math exercises to cracking decades-old, research-level conjectures.
The breakthrough relies on pairing large language models with Lean, a strict programming language that mechanically verifies logical steps.
This 'neuro-symbolic' approach eliminates AI hallucinations by providing immediate, ground-truth error feedback when a logical leap is flawed.
To test if AI is truly reasoning, mathematicians launched the 'First Proof' initiative, challenging models with encrypted, unpublished problems.
While AI still struggles with entirely unguided deep research, it is rapidly becoming an indispensable co-pilot for verifying tedious mathematical lemmas.

80 years

Age of the planar unit distance problem solved by AI

80 minutes

Time taken by AI to solve the next step of Erdős #1196

90%

Accuracy of Goedel-Prover-V2 on the miniF2F benchmark

Unpublished lemmas in the First Proof cryptographic challenge

Jared Duker Lichtman spent four years of his doctorate proving that prime numbers are the most efficient sets of their kind, eventually publishing a celebrated proof of the Erdős Primitive Set Conjecture. He then spent the next seven years chasing the next open question in that same mathematical family. In early 2026, an artificial intelligence system solved that next step in approximately 80 minutes. The resulting proof was so elegant that Lichtman described it as a "Book Proof"—a reference to Paul Erdős's concept of a divine text containing the universe's most perfect mathematical solutions.[1]

This moment represents a historic inflection point. For years, artificial intelligence has steadily mastered high school mathematics and Olympiad-level competition problems. But in 2025 and 2026, the technology crossed a critical threshold, moving from solving known exercises to cracking open, research-level conjectures that have stumped the world's best minds for decades.[1][4]

The breakthroughs are arriving at a rapid pace. In May 2026, OpenAI announced that its reasoning models had successfully tackled the planar unit distance problem, a geometric puzzle first posed by Erdős 80 years ago. By drawing on diverse branches of mathematics, the AI discovered an entirely new family of geometric constructions that broke an assumption mathematicians had held for nearly a century.[2]

Simultaneously, Google DeepMind's AlphaProof Nexus system demonstrated the ability to autonomously navigate complex research literature, successfully proving nine open Erdős conjectures drawn from a formalized database. These milestones raise a fundamental question: why are large language models, which have historically been terrible at basic arithmetic, suddenly capable of profound mathematical discovery?[4]

The answer lies in a paradigm shift known as neuro-symbolic reasoning. Historically, large language models operate by predicting the next most likely word in a sequence. This probabilistic approach is excellent for writing poetry or generating code, but it is fatal in mathematics, where a single hallucinated variable or skipped logical step collapses an entire proof.[3]

How AI models use strict programming languages to verify their own logic and eliminate hallucinations.

To solve the hallucination problem, researchers paired the creative intuition of neural networks with the unforgiving logic of a strict referee. That referee is Lean, an open-source functional programming language and interactive theorem prover.[3][6]

In Lean, every mathematical definition, theorem, and proof must be written in mechanically verified code. The compiler acts as an absolute gatekeeper. It tracks the logical state of a proof step-by-step; if an AI proposes a logical leap that violates the rules of mathematics, the Lean compiler simply throws an error and refuses to compile the code.[4][6]

In Lean, every mathematical definition, theorem, and proof must be written in mechanically verified code.

This pairing creates a powerful "data flywheel." The AI model acts as the creative engine, proposing possible tactical steps to solve a problem. Lean acts as the verifier. When the AI makes a mistake, Lean provides immediate, ground-truth error feedback, allowing the model to try again and iteratively self-correct without needing a human to grade its work.[3][4]

The results of this self-correction loop have been staggering. Researchers at Princeton University utilized this architecture to build Goedel-Prover-V2, an open-source system that achieved a 90% accuracy rate on the miniF2F mathematical benchmark. By relying heavily on Lean's feedback, the Princeton team achieved state-of-the-art results using models 80 times smaller than those deployed by commercial tech giants.[3]

Self-correcting models have seen rapid accuracy gains on standardized mathematical benchmarks.

Crucially, the AI is not merely brute-forcing its way through millions of possibilities. It is demonstrating genuine aesthetic elegance. In solving the von Mangoldt weight identity, the AI discovered a conceptual opening that human convention had completely overlooked, proving that machine reasoning can expand the stylistic boundaries of the discipline.[1]

Despite these triumphs, a core skepticism remains within the mathematical community: the problem of data contamination. Because modern AI models are trained on vast swaths of the internet, it is incredibly difficult to prove that a model is genuinely reasoning through a novel problem rather than simply regurgitating a similar proof it memorized during training.[5]

To definitively test the limits of machine reasoning, 11 leading mathematicians launched the "First Proof" initiative in February 2026. The researchers selected ten unpublished lemmas—minor component proofs—from their own active, ongoing research. Because these problems had never been posted online, they were guaranteed to be absent from any AI's training data.[5]

The First Proof initiative used encrypted, unpublished problems to test if AI systems were truly reasoning.

The mathematicians encrypted the answers and gave AI systems a one-week window to attempt the problems. The preliminary results offered a sobering reality check: while AI systems are incredibly powerful, they still struggle to autonomously clear the board on entirely novel, unguided deep research without human prompting.[5]

Rather than replacing mathematicians, AI is rapidly becoming an indispensable co-pilot. Working researchers are beginning to offload the tedious verification of minor lemmas to AI systems, freeing their own cognitive bandwidth for high-level architectural thinking and the formulation of new conjectures.[4][5]

This collaborative dynamic is democratizing the field. With Lean and AI assistants, students and researchers can verify their own proofs instantly, lowering the barrier to entry for rigorous mathematics. As the fusion of human intuition and formal verification continues to mature, mathematics is entering a golden age of accelerated discovery.[3][6]

How we got here

2013
The Lean theorem prover is developed, laying the groundwork for machine-readable mathematics.
July 2025
Princeton researchers release Goedel-Prover-V2, demonstrating rapid gains in self-correcting AI theorem proving.
February 2026
The 'First Proof' initiative launches, testing AI models on encrypted, unpublished mathematical lemmas.
May 2026
AI systems successfully tackle the 80-year-old planar unit distance problem and multiple open Erdős conjectures.

Viewpoints in depth

AI Researchers

Focus on scaling neuro-symbolic systems and using formal verifiers to create self-improving data flywheels.

Computer scientists and AI lab directors argue that the integration of large language models with formal verifiers like Lean solves the fundamental 'hallucination' problem that has historically plagued AI in mathematics. By allowing models to receive immediate, ground-truth feedback on their logical steps, researchers believe they have unlocked a self-correcting 'data flywheel.' This mechanism allows AI to scale its reasoning capabilities exponentially without relying entirely on human-annotated training data.

Working Mathematicians

View AI as a powerful co-pilot for tedious lemmas, but emphasize the continued need for human intuition in conjecture generation.

Many academic mathematicians remain cautiously optimistic. They celebrate AI's new ability to automate the verification of tedious component proofs—often referred to as lemmas—which frees up human cognitive bandwidth. However, they maintain that AI still struggles with the 'blank page' problem. Without a human to pose the right questions, frame the overarching architecture of a proof, and guide the system through deep, unmapped conceptual territory, the AI remains a highly capable assistant rather than an autonomous discoverer.

Formalization Advocates

Believe the systematic translation of mathematics into machine-readable code (Lean) is the true foundation of this revolution.

Contributors to open-source libraries like Mathlib argue that the real hero of this breakthrough is not the neural network, but the formalization language itself. They contend that translating centuries of human mathematics into strict, machine-readable code is a necessary prerequisite for future progress. In their view, the widespread adoption of Lean will not only empower AI but will fundamentally democratize the field, allowing students and researchers anywhere in the world to verify their work instantly without relying on the traditional peer-review bottleneck.

What we don't know

Whether AI systems can autonomously generate profound, high-level mathematical conjectures without human prompting.
How quickly the broader mathematical community will adopt formal verification languages like Lean into their daily workflows.
If there is a hard ceiling to the 'data flywheel' effect once AI exhausts the current library of human-formalized mathematics.

Key terms

Formal Verification: The process of proving the correctness of a mathematical statement using a computer program that checks every logical step.
Lean: An open-source programming language and interactive theorem prover that acts as a strict referee for mathematical logic.
Neuro-symbolic AI: An approach that combines the creative pattern-recognition of neural networks with the strict, rule-based logic of symbolic systems.
Lemma: A minor, proven proposition that serves as a stepping stone to prove a larger, more significant theorem.
Data Contamination: A testing flaw where an AI model appears to solve a novel problem, but has actually already memorized the solution from its vast training data.

Frequently asked

What is Lean in mathematics?

Lean is an open-source programming language and theorem prover that mechanically verifies mathematical proofs step-by-step, ensuring they contain no logical errors.

Did AI solve the Erdős planar unit distance problem?

Yes, in May 2026, an AI reasoning model successfully tackled the 80-year-old problem by discovering a new family of geometric constructions.

Will AI replace human mathematicians?

Currently, AI acts more like a highly capable research assistant. It excels at verifying tedious lemmas and finding novel tactical steps, but still relies on humans to pose the right questions and guide the overall architecture of a proof.

What was the 'First Proof' initiative?

A February 2026 challenge where mathematicians released 10 encrypted, unpublished problems to test if AI could genuinely reason rather than just regurgitate its training data.

Sources

[1]ForbesWorking Mathematicians
AI Solved A Mathematical Problem That Had Stumped The World's Best Minds For Decades
Read on Forbes →
[2]The GuardianWorking Mathematicians
OpenAI claims advance in AI reasoning after solving 80-year-old maths problem
Read on The Guardian →
[3]Princeton UniversityAI Researchers
Princeton researchers release open-source theorem prover that verifies its own work
Read on Princeton University →
[4]arXivAI Researchers
Advancing Mathematics Research with AI-Driven Formal Proof Search
Read on arXiv →
[5]United Nations UniversityWorking Mathematicians
The First Proof Initiative: Testing AI's True Mathematical Capabilities
Read on United Nations University →
[6]Lean FROFormalization Advocates
A Brief History of Lean
Read on Lean FRO →

Stay informed

Every angle. Every day.

Get science stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse science