
Google’s Gemini 2.5 Pro has reached a major AI milestone by solving 5 of 6 problems from the 2025 International Mathematical Olympiad (IMO), matching the gold medal cutoff typically reserved for the world’s brightest students. Conducted by researchers from UCLA, this independent study used a self-verifying “generate, criticize, repair” method that mimics how mathematicians iterate through solutions. While not part of the official competition, the results point to a breakthrough in machine reasoning. With minimal prompting and no external tools, Gemini achieved near-human performance, raising new possibilities and challenges for AI in advanced mathematics.
Advanced AI Model Achieves Human-Level Performance in Math Olympiad
The International Mathematical Olympiad (IMO) is one of the world’s most rigorous math competitions, demanding both creativity and precision. In July 2025, UCLA researchers ran a novel experiment to see whether Google’s Gemini 2.5 Pro could hold its own. The model tackled all six of the official IMO 2025 problems and solved five correctly, matching the typical gold medal threshold.
To avoid data leakage, researchers ensured the problems weren’t in Gemini’s training data. The only hints provided were minimal nudges for the first two questions, such as suggesting induction. The remaining four were solved with no additional guidance. This made the outcome more than just impressive; it was unprecedented.
Gemini succeeded by using a structured method: it first generated a solution, then reviewed and critiqued itself, and finally revised its output until verified. This iterative loop, akin to how a human student might refine their work, proved essential in solving high-difficulty problems. Its failure on the sixth problem wasn’t due to syntax or calculation but an incorrect underlying approach, showing the model still has limits in abstract reasoning. Still, matching a human gold medal score marks a new frontier in AI-assisted problem-solving.
New Proof Strategy Mimics Human Reasoning and Self-Correction
Central to this success was the “generate, criticize, repair” framework. Gemini would draft a solution, analyze it for logical flaws, and then attempt a fix. This cycle repeated until a verifier confirmed the solution five times consecutively. Unlike single-shot answers, this iterative structure mirrors how real mathematicians work, drafting, checking, and refining their reasoning over time.
One standout example was Problem 1, a tiling challenge. Gemini defined formal functions, partitioned the problem space, proved constraints, and even invoked symmetry to finalize its proof. Each step was logically sound and followed established mathematical norms. For the experiment, the model was set with a “thinking budget” of 32,768 tokens, roughly equal to 50–60 pages of text, allowing room for reasoning depth and error checking.
While this level of performance required considerable compute, it demonstrated something profound: when guided correctly, a large language model can not only mimic math skills but also execute them with formal rigor. It paves the way to use AI in educational, mathematical research, or technical writing contexts. Nevertheless, its resource requirements, scrappy prompt requirements, and the requirement of verification tools make it not ready for mass consumption yet, but it is a taste of things to come.
Gemini’s Olympiad Success Signals New AI Capabilities
Gemini 2.5 Pro’s ability to solve 5 out of 6 IMO problems isn’t just a technical achievement; it’s a conceptual leap for AI. It shows that with structured prompting and iterative reasoning, language models can engage in deep problem-solving, once thought impossible for machines. Not formally accepted in the IMO and yet taking up considerable compute, the experiment shows that the AI is gaining ground in expert-level thinking within complex disciplines. This has the potential to transform science and mathematics teaching, learning, and collaboration. With the development of the tools available, such as Gemini, the line between AI help and human creativity will only keep becoming increasingly blurred.