Import AI 442: Winners and losers in the AI economy; math proof automation; and industrialization of cyber espionage

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. The era of math proof automation has arrived: …Numina-Lean-Agent shows how math will never be the same… In the past few years, large-scale AI models have become good at coding and have also begun to generalize into other useful disciplines, especially those in math and science. Like with most aspects of AI development, the story has been one of increasing generalization and simplification of the systems as we shift away from highly specialized math models to just leveraging general-purpose foundation models and giving them the right tools to elicit their capabilities in a given domain. The latest example of this is Numina-Lean-Agent, an AI system that uses standard, general foundation models to do mathematical reasoning. With this software, a team of mathematicians have solved all problems in the Putnam 2025 math competition - matching the performance of proprietary systems which use a lot more math-specific stuff - and have also used it to conduct some original math research, working with it to formalize the Brascamp-Lieb theorem. What is Numina-Lean-Agent? The software was built by a team of researchers from the Chinese Academy of Sciences, University of Liverpool, Xi’an Jiaotong-Liverpool University, Tongji University, University of Cambridge, Project Numina, Imperial College London, and the University of Edinburgh. The software is “a formal math reasoner based on a general coding agent”. It has a few key components: Lean-LSP-MCP: Software to allow AI agents to interact with the Lean theorem prover. “empowers models with the capability to deeply comprehend, analyze, and manipulate Lean projects”, and gives models a toolset for semantic awareness and interaction, code execution and strategy exploration, and theorem retrieval. LeanDex: Semantic retrieval of related theorems and definitions - basically, a search tool for theorems. Informal Prover: A system which uses Gemini models to generate informal solutions. The most interesting tool of all: Discussion Partner: A tool which “empowers Claude Code with the ability to ’seek assistance’ during Lean formalization: when encountering obstacles—such as proof bottlenecks, dilemmas in strategy selection, or ambiguities in intermediate lemmas—the primary model can proactively initiate discussions with other LLMs”. Discovering math together: Along with the Putnam demonstration, the authors also used the software as an active partner in some math work, specifically formalizing Brascamp Lieb (I will not pretend to be able to explain what this means). “Over a period of less than two weeks of intermittent collaboration, the two human experts and the agent completed the formalization of more than 8,000 lines of Lean code. During this process, the agent autonomously introduced approximately 70 new definitions, lemmas, and theorems, illustrating its ability to actively extend the formal library and participate in large-scale, sustained formalization efforts,” the authors write. Why this matters - capability overhangs and AI ecologies: Numina-Lean-Agent neatly demonstrates two important things about contemporary AI: 1) AI systems are far more capable than people think and the creation of some specialized frameworks and tools often lets us elicit dramatically better capabilities from our systems (here, math, but it has been demonstrated in many domains), and 2) the AI ecology writ large is composed of many distinct frontier models and it seems like getting these models to interact with one another can lead to some richness, akin to how consulting different types of people about a single problem can reveal a better answer than just talking to one person. Read more: Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics (arXiv). Find out more at the GitHub page (Numina-Lean-Agent, GitHub). *** The industrialization of cyber espionage is nigh: …Some experiments on Opus 4.5 and GPT-5.2 indicate that the cyber environment could be on the cusp of major changes… Independent researcher Sean Heelan recently tested out how well Opus 4.5 and GPT-5.2 could generate exploits for a zeroday vulnerability in the QuickJS Javascript interpreter. Both models did very well, and this has major implications for cybersecurity. “We should prepare for the industrialisation of many of the constituent parts of offensive cyber security. We should start assuming that in the near future the limiting factor on a state or group’s ability to develop exploits, break into networks, escalate privileges and remain in those networks, is going to be their token throughput over time, and not the number of hackers they employ,” he writes. Caveats: QuickJS is a simple Javascript interpreter relative to the ones in Chrome and Firefox. Therefore, it may be harder for LLMs to employ the more complex and more widely deployed ones - though as with all things in AI, we can expect performance to improve quite rapidly. What does industrialized intrusion mean? “We are already at a point where with vulnerability discovery and exploit development you can trade tokens for real results,”: he writes. “The types of problems that you encounter if you want to automate the work of SREs, system admins and developers that manage production networks are conceptually similar to those of a hacker operating within an adversary’s network.” There’s lots of evidence for the above, ranging from things like OpenAI’s Aardvark project (where they find that the more tokens they spend, the more bugs they find), and things like Anthropic’s discovery of an AI-orchestrated hacking system. Why this matters - the cyberworld is about to move at machine speed: My bet is that most parts of cyberoffense and cyberdefense are going to move to running at “machine speed”, where humans get taken out of most of the critical loops. This will both increase the frequency of hacking attacks while also dramatically scaling up the effectiveness of any individual human defender or attacker (as they will be scaled by AI systems which work for them). The true wildcard question is whether this turns out to be offense- or defense-dominant - my guess is we’re heading for an era of offense-dominance as it’ll take a while for defenses to get deployed. In related news, OpenAI CEO Sam Altman said this week he expects OpenAI’s models will soon reach the “Cybersecurity High” level on his company’s preparedness framework - this would mean models were available which “remove existing bottlenecks to scaling cyber operations including by automating end-to-end cyber operations against reasonably hardened targets OR by automating the discovery and exploitation of operationally relevant vulnerabilities” - thanks to Nathan Calvin for pointing this out. Read more: On the Coming Industrialisation of Exploit Generation with LLMs (Sean Heelan blog). *** Economist: AI will be bigger than electricity and semiconductors: …And it’s therefore worth spending a ton of money to reduce AI risks… Stanford economist Charles “Chad” Jones has written a paper which says AI will “likely be the most important technology we have ever developed”, and that “automating intelligence itself arguably has broader effects than electricity or semiconductors”. Why take AI seriously? The gist of the paper is that AI represents a massive technological invention which will contribute to economic growth in the future. In the past, major inventions (e.g, electricity, the internet, cars, etc) have all done the same. In fact, counterintuitively, if you look at US GDP growth you find that despite all these prior technological revolutions, GDP has been steadily increasing at about 2% a year for many, many years. Therefore, the baseline scenario is where AI just does this - and then we don’t live in too crazy a world. But there is a world where things could be different - where AI works…

create your storyflo · everywhere you listen.

Import AI 442: Winners and losers in the AI economy; math proof automation; and industrialization of cyber espionage