AI

A Google AI cracked math problems open for 56 years — for a few hundred dollars each

Susan Hill

A research system from Google DeepMind has produced complete, machine-checked proofs for nine open problems first posed by the mathematician Paul Erdős, two of them unsolved for 56 years. The same system settled 44 conjectures drawn from the Online Encyclopedia of Integer Sequences, closed a 15-year-old question in algebraic geometry, and tightened a known bound in convex optimization. The headline count matters less than the method. Every one of these proofs was verified by a machine, not merely asserted by one.

Erdős, who died in 1996, left behind hundreds of precise, stubborn questions, many of them simple to state and brutally hard to settle. Over decades they became a kind of standing exam for the field. The integer-sequence conjectures come from a public database that working mathematicians mine for patterns, where a guessed formula can sit unproven for years. These are not contrived benchmarks built to flatter a model. They are the real backlog of open mathematics.

That distinction is the whole story. The system, called AlphaProof Nexus, writes its arguments in Lean, a formal language whose compiler rejects any step it cannot confirm. A proof either passes or it does not, with no room for a confident paragraph that later turns out to be wrong. For anyone trying to judge whether an AI ‘discovery’ is real, this is the line between a press release and a result.

Underneath, the prover runs on Gemini 3.1 Pro, with a lighter model handling ranking work. The loop is almost boring. The model drafts a proof in Lean, the compiler hands back errors, and those errors feed into the next attempt. Symbolic feedback, not fluent prose, is what keeps it honest. The team built four versions of rising complexity, including one that breeds and ranks rival proof sketches. Yet the simplest version, a plain model-and-compiler loop, solved all nine Erdős problems on its own.

The economics are the quietly startling part. Each solved problem cost a few hundred dollars in computing time. Questions that had consumed careers were closed for roughly the price of a weekend away. This does not retire the mathematician. Someone still has to choose which problems are worth attacking, frame them in a form the system can read, and decide what an answer means. What it changes is the arithmetic of what is worth trying at all.

The caveats are larger than the headline. Nine solved out of 353 Erdős problems attempted is a hit rate of about 2.5 percent. The integer-sequence figure, 44 out of 492, sits under nine percent. The authors are blunt that most of these problems stay out of reach, let alone problems that require extensive new theory, and that the wins cluster in areas where Lean’s mathematics library is already deep. Take away that human-built scaffolding and the curated list of targets, and the system has little to stand on.

The caution is earned. In a widely mocked episode, a rival lab announced that its model had solved ten Erdős problems, only for mathematicians to point out that the answers already sat in the published literature. The model had found them, not proved them. AlphaProof Nexus is built to be immune to that mistake. A Lean proof of a known result is still a valid proof, and a Lean proof of something genuinely new cannot be bluffed. Demis Hassabis, who runs DeepMind, went out of his way to say the work is not artificial general intelligence, an unusually careful note from a company rarely shy about its models.

There is a subtler payoff the researchers stress. Even the failures were useful. Because every partial proof is formally checked, mathematicians could see exactly which sub-goals the system could and could not close, without re-checking the whole argument by hand. The machine stops being an oracle and becomes a tireless collaborator that shows its work and points to where the hard part still hides.

The result does not stand alone. It lands in the same stretch as a separate claim from a rival reasoning model, reported to have disproved a roughly 80-year-old Erdős conjecture in discrete geometry, a finding that working mathematicians refined and endorsed. Two labs, two methods, one leaning on formal verification and the other on raw chains of reasoning, reached the same frontier within weeks of each other. The contest is no longer about chatbots that sound clever.

The work was laid out in a paper published this month, and the methods lean on open tools, namely Lean and its community-built library, so outside groups can inspect and rerun the proofs rather than trust a corporate blog. DeepMind has not said whether the system will reach researchers outside the company. The number to watch is not nine. It is whether 2.5 percent becomes ten, then twenty, because the day it does, the argument about what these machines are for will have to start over.

Discussion

There are 0 comments.