icon
img
Breakthrough at IMO 2025
In a landmark achievement, artificial intelligence systems from Google DeepMind and OpenAI have independently reached gold-medal standards at the 2025 International Mathematical Olympiad (IMO)—the world’s most prestigious mathematics competition for high school students. Google DeepMind’s advanced Gemini model with "Deep Think" technology solved five of six problems perfectly, scoring 35/42 points under official IMO grading protocols. IMO, President Prof. Gregor Dolinar confirmed the solutions were "clear, precise, and easy to follow" 127. Remarkably, this was accomplished within the competition’s 4.5-hour time limit, a significant leap from 2024’s silver-medal performance, which required days of computation.
OpenAI’s experimental reasoning model also scored 35 points, evaluated by former IMO medalists under the same rules as human contestants. While five human contestants achieved perfect scores, this marks the first time AI has reached the gold tier.
Evolution of AI’s Mathematical Capabilities
Just two years ago, AI’s mathematical abilities were notoriously limited. Early models like ChatGPT struggled with basic arithmetic and hallucinations. The transformation has been driven by three key innovations: Reasoning Architectures: Modern "large reasoning models" (LRMs) like Gemini Deep Think and OpenAI’s o3 use "internal monologues" to explore multiple solution paths simultaneously. This "parallel thinking" mimics human problem-solving by testing approaches, backtracking from errors, and refining arguments. Tool Integration: AI now leverages Python interpreters for calculations and Lean Prover for formal proof verification. This hybrid approach reduces errors and ensures logical rigor. Reinforcement Learning: Systems generate synthetic training data, allowing them to tackle novel problems beyond their original training
Table: Evolution of AI Performance at IMO
AI Performance at the International Math Olympiad
Year AI System Score Medal Level Time / Cost
2024 AlphaProof (DeepMind) 28 / 42 Silver 2–3 days computation
2025 Gemini Deep Think 35 / 42 Gold 4.5 hours
2025 OpenAI Reasoning Model 35 / 42 Gold Within time limit
Implications for Research and Education The implications extend far beyond competitions: Accelerating Scientific Discovery: AI models have already contributed to breakthroughs in knot theory, protein folding (winning a 2024 Nobel Prize in Chemistry), and elliptic curves. At a May 2025 symposium, an AI model solved a Ph.D.-level number theory problem in minutes—work that would take human mathematicians months.
Redefining Mathematics Education:
Harvard professor Michael Brenner redesigned his graduate-level applied math course after AI began solving complex problems. Students now create novel problems to challenge AI, fostering deeper creativity. "That is the dream," Brenner noted.
Collaborative Research: Fields Medalist Terence Tao envisions AI as a "co-pilot" that handles technical steps, freeing mathematicians to focus on high-level direction. Combined with proof assistants like Lean, this could democratize access to cutting-edge mathematics .
Behind the Technology: How AI "Thinks"
Google DeepMind’s Deep Think mode enables Gemini to: Explore and combine multiple solution strategies in parallel. Access curated mathematical corpora and reinforcement learning tailored for theorem-proving. Generate human-readable proofs directly from natural language problem statements. This contrasts with 2024’s AlphaProof, which required translating problems into specialized languages. Controversies and Limitations
Despite progress, skepticism remains:
Methodology Concerns: AI models use "best-of-n" strategies, generating dozens of solutions and selecting the strongest. Human competitors, by contrast, submit a single attempt. Research vs. Olympiad Problems: IMO problems test "clever tricks" within constrained domains. Research in mathematics involves open-ended exploration over years. As mathematician Sergei Gukov explains, solving the Riemann hypothesis might require "a million lines of proof," far beyond current AI capabilities. Hallucination Risks: AI-generated proofs remain error-prone at the research frontier. As one mathematician noted, models consistently make subtle mistakes in advanced category theory IMO, President Dolinar emphasized that while AI’s solutions were validated, the organization "cannot validate the methods, including the computing used or human involvement".
The Road Ahead
Institutions are rapidly adapting: NSF’s New Institute: Carnegie Mellon University will host the Institute for Computer-Aided Reasoning in Mathematics (ICARM), advancing AI-human collaboration in mathematical research. DARPA’s expMath Initiative: Aims to develop "AI coauthors" capable of decomposing complex problems into manageable subproblems.
New Benchmarks
The FrontierMath test, designed by 60+ mathematicians, evaluates AI on unsolved research problems. Current top models score just 13–19%, highlighting the gap between competition prowess and true innovation. As Terence Tao predicts, AI may soon enable "mass production" of theorems, transforming mathematics into a field where humans direct "AI teams" to explore vast solution spaces 6. Yet for now, the consensus is clear: AI is a revolutionary tool, not a replacement for human insight.
img
img