Axiom Math: Scaling Brilliance with Verified AI

Axiom Math, a seven-month-old company, has secured a significant $200 million Series A funding round, valuing the company at $1.6 billion. This substantial investment underscores a burgeoning belief in the commercial viability of formal mathematics and verified AI, a domain traditionally viewed as niche. Axiom's mission extends beyond simply proving mathematical theorems; they aim to leverage formal methods to scale and compound superintelligence, fundamentally changing how we approach AI development and deployment.

The Math Startup Thesis: Beyond Pure Research

The substantial funding round prompts the question: how can a "math startup" justify such a valuation? Axiom's thesis, akin to Anthropic's early focus on coding, posits that structured and formal data, like that found in mathematics, offers a more horizontal and transferable advantage than initially perceived. While their core DNA is rooted in mathematics, with a mission to create AI that can act as a superhuman mathematician, their work has profound implications for verified reasoning and AI safety.

This approach contrasts with the common perception of formal verification as a tedious, compliance-driven process focused on eliminating errors and hallucinations. For Axiom, verified AI is about "scaling brilliance" and "compounding super intelligence." They draw parallels to mathematicians like Ramanujan, whose intuitive brilliance was amplified and transformed into theorems through the rigorous process of proof. Similarly, the structured, step-by-step logical deduction inherent in mathematical proofs, when translated into code, can serve as a powerful foundation for AI.

Verified AI: Scaling Brilliance, Not Fixing Lousiness

Formal verification, a field predating deep learning, has historically been applied in safety-critical domains like aerospace and critical infrastructure. However, Axiom views it not as a burden, but as a performance enhancer. They believe that verified generation leads to increased sample efficiency, allowing a startup with fewer resources than frontier labs to achieve or even surpass superhuman performance on complex tasks.

This was demonstrated by Axiom's performance on the Putnam exam, a prestigious undergraduate mathematics competition. While the best human score was 110 out of 120, Axiom achieved a perfect score of 120, outperforming even leading LLMs like DeepSeek, which scored 103. This achievement signifies a paradigm shift, suggesting that formal mathematical systems, despite potentially less training data, can rival or exceed the performance of informal LLMs.

Axiom's System: Lean Data, RL, and the Putnam Perfect Score

Axiom's success is built upon their heavy reliance on "Lean data" – proofs written in the Lean formal language. Lean is a computer program that acts as a formal language for mathematical proofs, similar to its cousins Coq and Isabelle. When a proof is written and compiled in Lean, and assuming no "weird things" like unchecked tactics are used, it is guaranteed to be correct. This is rooted in the Curry-Howard correspondence, which links proofs to programs.

Lean is a Turing-complete, functional programming language that can be used for both coding and mathematics. Axiom leverages Lean's capabilities by fine-tuning foundation models using Reinforcement Learning (RL) for formal mathematics. Their approach involves recursively decomposing proof goals into sub-goals and learning to backtrack, aiming to innovate beyond standard RL pipelines.

Mathematical Discovery: Before the Conjecture

Axiom is also investing heavily in "mathematical discovery" tools, which they plan to open-source. These tools are designed to assist mathematicians and theoretical physicists in formulating conjectures and constructions, a crucial pre-proving step. These tools can suggest interesting examples, sequences, or graph constructions, aiding in the initial stages of problem-solving. This initiative is spearheaded by team members with a history of solving complex mathematical problems, including counter-example finding for conjectures and solving long-standing problems like the global Leono function.

Rice's Theorem, Incompleteness, and Practical Limits

While theoretical limitations like Rice's Theorem suggest that not all programs can be formally verified, Axiom focuses on verifying the "majority of useful programs." Their vision is to integrate verification into the coding process, allowing AI to generate formally verified code components. This could range from front-end websites to more complex distributed systems. The goal is to provide a system where AI can either generate a formally verified program or state that a task is currently too difficult.

The process involves generating code and its corresponding proof. Axiom's performance on the Code Marina benchmark, a code verification benchmark, demonstrates their capability. While GPT-4 achieved a low pass rate, Axiom's system achieved a 99% success rate on code-with-proof generation, solving 187 out of 189 problems. This success is attributed to using Lean for proofs and a strongly typed language like Rust for code, creating a more cohesive objective function for RL.

Code With Proof — The Verina Benchmark

The Verina benchmark, developed by researchers from Berkeley and Meta, aims to evaluate AI's ability to generate code with accompanying proofs. Axiom's impressive performance highlights the potential of their approach. The challenge lies in ensuring that the generated code truly satisfies the verifiability conditions of the problem statement. While human mathematicians rely on peer review and consensus, Axiom's system leverages the formal guarantees of Lean to ensure correctness.

However, the problem of underspecification remains. Humans are notoriously bad at specifying every desired outcome, leading to potential gaps in formal specifications. Axiom acknowledges this limitation, stating, "Currently, the vision as of currently is anything that can be specified can be proven." They are exploring ways to address this through techniques like mutation-based LM unit test generation, which can help propose specifications.

Proof Trees, Context Windows, and Scaling Limits

The sheer size of formal proofs can be a challenge. Axiom notes that for every line of code, there can be up to 20 lines of proof. While they haven't established a definitive scaling law for this relationship, they believe their "Axium Improver," an ensemble system of post-trained models, can handle increasingly complex proof trees. They are not currently running into theoretical constraints, focusing instead on execution and scaling.

The problem of context windows in LLMs is also a consideration. Axiom's approach involves auto-formalization, where informal proofs are converted into formal ones, and then back into informal summaries. This process helps manage large proof structures and maintain consistency.

Collaboration, Polymath, and Human Attention as the Bottleneck

Axiom believes that verified AI will foster greater collaboration, moving beyond human-human collaboration to human-AI and even AI-AI collaboration. They see verification not as a barrier to entry for closed industries, but as a catalyst for openness. The desire to understand, driven by human curiosity, will ensure that even with AI-generated proofs, humans will continue to engage with and explore mathematical concepts. Human attention, rather than computational power, is seen as the ultimate bottleneck.

Founding Story — Obsession, Law School, and Julie Zhuo

Karina Hong's journey to founding Axiom Math is a testament to her deep-seated passion for mathematics and AI. After pursuing neuroscience at Oxford and AI research at UCL Gatsby, she enrolled in Stanford's JD-PhD program. During her year in law school, she found herself increasingly drawn to technology and the progress of AI, particularly in reasoning. This obsession, coupled with conversations with her CTO, Shoubo, led her to realize the immense potential of AI in mathematics.

A pivotal moment came when she met Julie Zhuo, a former Facebook PM, who advised her to "follow your energy." This led Hong to pivot from academia to entrepreneurship, founding Axiom Math with the belief that AI-driven mathematical reasoning would be a significant part of the future.

The Bigger Vision — AGI, Science, and Transfer Learning

Axiom's vision extends beyond mathematics to encompass AGI, AI for science, and even law. They believe that the structured reasoning developed through formal mathematics can unlock capabilities in diverse fields. The core of their strategy lies in the belief that advancements in math and formal verification will drive progress across scientific disciplines and beyond.

The company's success is attributed to its team of expert mathematicians who are also users of the system they are developing, creating a rapid iteration loop. This interdisciplinary approach, combining mathematical rigor with applied ML and codegen expertise, is seen as a key differentiator.

Key Takeaways

Verified AI for Scaling Brilliance: Axiom Math redefines formal verification not as a means to fix errors, but as a tool to scale and compound intelligence.
Math as a Horizontal Advantage: The structured nature of mathematics provides a powerful foundation for AI, with transferable applications beyond pure math.
Lean and Formal Proofs: Axiom leverages the Lean formal language to generate and verify mathematical proofs, ensuring correctness and enabling performance gains.
Mathematical Discovery: Axiom is developing tools to assist in the pre-conjecture stages of mathematical research, aiming to democratize discovery.
Code-with-Proof Generation: The company is pioneering the generation of code accompanied by formal proofs, a critical step towards trustworthy AI.
Human Attention as the Bottleneck: Despite AI's rapid progress, human curiosity and taste will remain crucial in guiding research and development.
Startup Advantage: Axiom believes that a focused startup environment allows for sustained dedication to long-term, ambitious goals, unlike larger organizations that may face shifting priorities.