AI MathematicsEvidence PackJun 15, 2026, 3:11 AM· 9 min read

AI Systems Cross Historic Threshold by Solving Decades-Old Open Math Problems

In a watershed moment for artificial intelligence, models from Google DeepMind and OpenAI have successfully solved multiple open mathematical conjectures that had stumped human experts for up to 80 years.

By Factlen Editorial Team

AI Development Community 40%Working Mathematicians 40%Skeptics & Traditionalists 20%
AI Development Community
Focuses on the scaling of reasoning capabilities and the shift toward neuro-symbolic systems.
Working Mathematicians
Embraces the tools as powerful collaborators while emphasizing the continued need for human intuition.
Skeptics & Traditionalists
Highlights the limitations of current models and the philosophical differences between computation and true understanding.

What's not represented

  • · Educators adapting university math curricula
  • · Philosophers of mathematics

Why this matters

This breakthrough marks the moment artificial intelligence transitions from merely summarizing known information to actively discovering new knowledge. For anyone working in science, engineering, or data, it signals the arrival of AI as a genuine collaborator capable of solving problems that have stumped human experts for decades.

Key points

  • In May 2026, OpenAI and Google DeepMind announced their AI models solved decades-old open mathematical problems.
  • OpenAI's reasoning model disproved a combinatorial geometry conjecture that had remained open since 1946.
  • Google DeepMind's AlphaProof Nexus solved nine open Erdős problems, including two that had been unsolved for 56 years.
  • The breakthroughs rely on pairing Large Language Models with formal proof assistants like Lean to eliminate logical errors.
  • Experts view this as a major shift from AI acting as an 'assistant' to becoming a genuine 'contributor' in scientific discovery.
9
Open Erdős problems solved by AlphaProof Nexus
78 years
Age of the geometry conjecture disproved by OpenAI
56 years
Time two of the Erdős problems remained unsolved
44
OEIS conjectures proven by DeepMind's AI

In late May 2026, the landscape of pure mathematics experienced a seismic shift as two of the world's leading artificial intelligence laboratories announced back-to-back breakthroughs. Within a span of 48 hours, both OpenAI and Google DeepMind revealed that their advanced reasoning models had successfully solved multiple open mathematical problems that had stumped human experts for decades. For years, the application of AI in mathematics was largely confined to pattern matching, regurgitating known proofs, or assisting with computational heavy lifting. These new milestones, however, represent a crossing of the Rubicon: artificial intelligence systems are no longer just studying existing mathematics; they are actively generating novel, verified discoveries that add to the sum total of human knowledge. The announcements sent a quiet but profound signal through both the academic mathematics community and the broader technology sector, fundamentally redefining the ceiling of machine reasoning.[1][2][3]

The first domino fell when OpenAI announced that an unreleased version of its general reasoning model had successfully disproved a combinatorial geometry conjecture that had remained open since 1946. The 80-year-old problem had frustrated generations of mathematicians, but the AI system was able to generate a valid counterexample. What made this achievement particularly notable was not the raw computational power applied, but the method of discovery. The model recognized a hidden structural connection between combinatorial geometry and an entirely separate mathematical domain—a cross-disciplinary insight that human researchers had never thought to link. Experts noted that this kind of lateral thinking is a hallmark of high-level human mathematical intuition, suggesting that the model was engaging in genuine cross-domain reasoning rather than merely retrieving information from its training data.[2][6]

Just one day after OpenAI’s announcement, Google DeepMind unveiled an even more expansive achievement with its AlphaProof Nexus system. DeepMind reported that its AI had autonomously solved nine open Erdős problems—a famous collection of notoriously difficult questions in combinatorics and graph theory posed by the legendary mathematician Paul Erdős. Among the nine solved problems were two that had remained completely unanswered for 56 years. In addition to the Erdős problems, the AlphaProof Nexus system successfully proved 44 open conjectures sourced from the Online Encyclopedia of Integer Sequences. The sheer volume of the discoveries, achieved at a compute cost of only a few hundred dollars per problem, demonstrated that AI-driven mathematical discovery could be scaled efficiently.[1][5]

The mechanism driving these breakthroughs relies on a hybrid architecture that pairs the intuitive, pattern-matching capabilities of Large Language Models (LLMs) with the unforgiving rigor of formal proof assistants. Historically, the biggest barrier to using LLMs for advanced mathematics was their tendency to hallucinate—producing confident-sounding but logically flawed chains of reasoning. To solve this, researchers integrated the neural networks with systems like Lean, a specialized programming language and interactive theorem prover used by mathematicians to verify proofs line by line. In this neuro-symbolic setup, the LLM acts as the creative engine, generating potential logical steps and intuitive leaps. Lean acts as the strict referee, instantly checking each step against the fundamental axioms of mathematics.[1][2][5]

A breakdown of the historic mathematical problems solved by AI systems in May 2026.
A breakdown of the historic mathematical problems solved by AI systems in May 2026.

This continuous feedback loop fundamentally alters how the AI operates. If the language model hallucinates a false step or makes a logical error, the Lean proof assistant immediately rejects it. The AI is then forced to backtrack, correct its error, and attempt a new path until it produces a proof that is mathematically flawless and machine-verified. This iterative process of generation and verification allows the system to navigate incredibly complex problem spaces without drifting into logical dead ends. By the time a proof is outputted by AlphaProof Nexus, it carries a guarantee of mathematical correctness, eliminating the need for humans to painstakingly hunt for subtle errors in the AI's logic.[1][5]

Industry analysts point out that these developments mark a critical transition in the role of artificial intelligence in expert workflows. Until now, AI has primarily functioned as an "assistant"—summarizing long documents, drafting boilerplate code, or retrieving known information. The math breakthroughs of May 2026 signal a shift toward AI as a "contributor." Instead of merely speeding up existing processes, the models are producing entirely new intellectual property and novel results that human experts did not previously possess. This paradigm shift forces organizations and academic institutions to rethink how they integrate AI, moving from simple automation to collaborative discovery where humans review and build upon machine-generated insights.[3]

Despite the celebratory headlines, researchers are quick to emphasize the transparent limitations and uncertainties that still surround these reasoning models. The AI systems are not yet autonomous mathematicians capable of independently identifying interesting problems and publishing papers. The most successful results still rely heavily on human-AI collaboration. Human mathematicians are required to translate the open problems into the formal language of Lean, guide the models when they get stuck, and verify the broader significance of the proofs. The models act as highly capable reasoning engines, but the overarching judgment, context, and direction must still be provided by human experts.[2][3]

Despite the celebratory headlines, researchers are quick to emphasize the transparent limitations and uncertainties that still surround these reasoning models.

Furthermore, the current generation of reasoning models still struggles with specific types of mathematical challenges. While they excel at finding counterexamples and navigating complex combinatorial spaces, they frequently fail when a problem requires the invention of an entirely new mathematical construction or framework. They also exhibit weaknesses in problems that demand extremely long, unbroken chains of dependencies, where early conceptual errors can compound and derail the entire proof. The recent breakthroughs represent the high-water mark of current capabilities, not the average performance, and the models can still produce confident but incorrect reasoning when operating outside of tightly controlled, formally verified environments.[1][2]

How it works: Large Language Models generate logical steps, while formal proof assistants rigorously verify them.
How it works: Large Language Models generate logical steps, while formal proof assistants rigorously verify them.

The computational cost of this advanced reasoning is also a significant factor for the broader adoption of these technologies. The "thinking" models utilized by OpenAI and Google—which deliberate internally and check their own work before outputting an answer—require substantially more processing power and time than standard conversational AI. While DeepMind noted that solving the Erdős problems cost only a few hundred dollars each, this is still orders of magnitude more expensive than a typical LLM query. A simpler version of the AlphaProof agent was able to match some of the results, but it required even more compute time, highlighting the ongoing trade-off between model efficiency and reasoning depth.[1][2]

The reaction from the global mathematical community has been a mixture of awe, cautious optimism, and philosophical reflection. For decades, pure mathematics was considered the ultimate bastion of human intellect, requiring a spark of intuition that machines supposedly lacked. The realization that AI can now find hidden structural connections and solve 80-year-old conjectures has prompted a reevaluation of what constitutes mathematical creativity. Many working mathematicians are enthusiastically embracing these tools, viewing formal proof assistants like Lean and AI reasoning models as the modern equivalent of the telescope—instruments that will allow humans to see further into the mathematical universe than ever before.[4][7]

Looking forward, the integration of AI into mathematics is expected to accelerate the pace of discovery across multiple scientific disciplines. Mathematics is the foundational language of physics, computer science, and cryptography; breakthroughs in pure math frequently unlock new capabilities in applied sciences. As models become more adept at formal reasoning, they could eventually be deployed to verify the safety of complex software systems, discover new cryptographic protocols, or solve optimization problems in logistics and drug discovery. The events of 2026 have definitively proven that AI can reason mathematically; the next frontier is scaling that reasoning to solve the world's most intractable scientific challenges.[3][5]

The historical context of these solved problems adds significant weight to the achievements. Paul Erdős, one of the most prolific mathematicians of the 20th century, famously offered cash prizes for the solutions to his open problems, ranging from $25 to several thousand dollars depending on their difficulty. The fact that two of the problems solved by Google DeepMind had withstood 56 years of intense scrutiny by the world's brightest mathematical minds underscores the magnitude of the AI's capability. These were not obscure, forgotten puzzles; they were well-known, actively researched conjectures that had simply proven too complex for human combinatorial techniques.[1][5]

The era of the AI-augmented mathematician requires humans to collaborate closely with machine reasoning engines.
The era of the AI-augmented mathematician requires humans to collaborate closely with machine reasoning engines.

Similarly, the 1946 combinatorial geometry conjecture disproved by OpenAI had a storied history. Combinatorial geometry deals with the arrangements and properties of discrete geometric objects, and conjectures in this field often appear deceptively simple to state but are notoriously difficult to prove or disprove. By finding a valid counterexample, the AI did not just close a chapter in a textbook; it actively reshaped the boundaries of what is known in the field. The model's ability to pull tools from a separate mathematical domain to construct the counterexample suggests a level of synthetic reasoning that mimics the cross-pollination of ideas often seen in the best human research.[2][6]

The implications for mathematics education and training are also profound. As AI systems become capable of generating complex, machine-verified proofs, the focus of mathematical education may shift away from the mechanical execution of proofs toward higher-level conceptual framing and problem selection. Future mathematicians will likely need to become fluent in formal verification languages like Lean, acting as "prompt engineers" for pure mathematics. The ability to translate abstract human intuition into the strict, logical constraints required by proof assistants will become a critical skill, bridging the gap between human creativity and machine computation.[3][7]

Ultimately, the dual breakthroughs by OpenAI and Google DeepMind serve as a definitive proof of concept for the next era of artificial intelligence. The transition from models that merely predict the next word to models that can deliberate, verify, and discover represents a fundamental leap in machine intelligence. While the dream of a fully autonomous, artificial mathematician remains on the horizon, the reality of the AI-augmented mathematician has officially arrived. As these reasoning engines continue to scale and integrate with formal verification tools, the mathematical community stands on the precipice of a golden age of discovery, powered by the unprecedented collaboration between human intuition and machine logic.[2][3][5]

How we got here

  1. 1946

    A complex conjecture in combinatorial geometry is first posed, remaining unsolved for nearly eight decades.

  2. 1970

    Two specific Erdős problems in combinatorics are introduced, stumping mathematicians for 56 years.

  3. May 2026

    OpenAI announces its reasoning model has successfully disproved the 1946 geometry conjecture.

  4. May 2026 (One day later)

    Google DeepMind reveals AlphaProof Nexus, which autonomously solved nine open Erdős problems.

Viewpoints in depth

AI Researchers & Developers

Focus on the scaling of reasoning capabilities and the shift toward neuro-symbolic systems.

For the AI development community, these breakthroughs validate the massive investments in 'System 2' reasoning models. Researchers argue that pairing LLMs with formal verification environments like Lean solves the persistent hallucination problem that has plagued generative AI. They view this as a blueprint for future systems across all domains: using neural networks for intuitive leaps and symbolic logic engines for rigorous verification.

Working Mathematicians

Embrace the tools as powerful collaborators while emphasizing the continued need for human intuition.

Many working mathematicians welcome the arrival of highly capable AI assistants, noting that formalizing proofs is often tedious and error-prone for humans. However, they emphasize that mathematics is not just about verifying logic; it is about knowing which problems are worth solving and framing them correctly. They view AI as a powerful new instrument—like a telescope for numbers—that still requires a human to point it at the right stars.

Skeptics & Traditionalists

Highlight the limitations of current models and the philosophical differences between computation and true understanding.

Skeptics within the academic community point out that finding counterexamples in combinatorics, while impressive, relies heavily on searching massive possibility spaces—a task perfectly suited for computers. They argue that the models still lack genuine mathematical 'understanding' and struggle to invent entirely new conceptual frameworks. Until an AI can formulate a novel, profound conjecture on its own, traditionalists maintain that the core of mathematical creativity remains exclusively human.

What we don't know

  • Whether these reasoning models can invent entirely new mathematical frameworks, rather than just solving existing problems within known parameters.
  • How quickly the broader mathematical community will adopt formal verification languages like Lean to collaborate with these AI systems.
  • The exact computational cost and energy footprint required to scale these reasoning models for more complex, multi-year mathematical proofs.

Key terms

Conjecture
A mathematical statement that is believed to be true based on partial evidence but has not yet been rigorously proven or disproved.
Combinatorics
A branch of mathematics focused on counting, arranging, and finding patterns within discrete structures.
Formal Proof Assistant
A specialized software tool, such as Lean, that allows mathematicians to write proofs in computer code so the machine can verify every logical step with absolute certainty.
Neuro-symbolic AI
An artificial intelligence approach that combines the pattern-recognition capabilities of neural networks with the strict, rule-based logic of symbolic programming.
Hallucination
In AI, the generation of false, logically flawed, or nonsensical information presented confidently as fact.

Frequently asked

What exactly did the AI models solve?

OpenAI's model disproved a 1946 combinatorial geometry conjecture, while Google DeepMind's AlphaProof Nexus solved nine open Erdős problems and 44 integer sequence conjectures.

How do these AI models avoid making logical errors?

The models are paired with formal proof assistants like Lean. The AI generates the logical steps, and the proof assistant rigorously verifies them, rejecting any hallucinations or false logic.

Will AI replace human mathematicians?

Not currently. The models act as highly capable reasoning engines, but they still require human experts to frame the problems, translate them into code, and guide the overall direction of the research.

Why are Erdős problems significant?

Paul Erdős was a legendary mathematician who posed hundreds of deceptively simple but notoriously difficult problems in combinatorics and graph theory, many of which have stumped experts for decades.

Sources

Source coverage

7 outlets

3 viewpoints surfaced

AI Development Community 40%Working Mathematicians 40%Skeptics & Traditionalists 20%
  1. [1]The Rundown AIAI Development Community

    Google tops OpenAI's math breakthrough — 9 to 1

    Read on The Rundown AI
  2. [2]MindStudioAI Development Community

    A Conjecture That Stood for 78 Years Just Fell to an AI

    Read on MindStudio
  3. [3]NexairiAI Development Community

    OpenAI Math Breakthrough: What Experts Should Watch

    Read on Nexairi
  4. [4]Quanta MagazineWorking Mathematicians

    The Biggest Breakthroughs in Mathematics: 2025

    Read on Quanta Magazine
  5. [5]DeepMind ResearchWorking Mathematicians

    AlphaProof Nexus: Solving open Erdős problems with AI

    Read on DeepMind Research
  6. [6]OpenAI ResearchWorking Mathematicians

    Disproving a 1946 combinatorial geometry conjecture

    Read on OpenAI Research
  7. [7]Reddit (r/math)Skeptics & Traditionalists

    Google claims math breakthrough with proof-solving AI models

    Read on Reddit (r/math)
Stay informed

Every angle. Every day.

Get science stories with full source coverage and perspective breakdowns delivered to your inbox.

AI Systems Cross Historic Threshold by Solving Decades-Old Open Math Problems | Factlen