Import AI 442: Winners and losers in the AI economy; math proof automation; and industrialization of cyber espionage
Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If youâd like to support this, please subscribe. The era of math proof automation has arrived: âŠNumina-Lean-Agent shows how math will never be the same⊠In the past few years, large-scale AI models have become good at coding and have also begun to generalize into other useful disciplines, especially those in math and science. Like with most aspects of AI development, the story has been one of increasing generalization and simplification of the systems as we shift away from highly specialized math models to just leveraging general-purpose foundation models and giving them the right tools to elicit their capabilities in a given domain. The latest example of this is Numina-Lean-Agent, an AI system that uses standard, general foundation models to do mathematical reasoning. With this software, a team of mathematicians have solved all problems in the Putnam 2025 math competition - matching the performance of proprietary systems which use a lot more math-specific stuff - and have also used it to conduct some original math research, working with it to formalize the Brascamp-Lieb theorem. What is Numina-Lean-Agent? The software was built by a team of researchers from the Chinese Academy of Sciences, University of Liverpool, Xiâan Jiaotong-Liverpool University, Tongji University, University of Cambridge, Project Numina, Imperial College London, and the University of Edinburgh. The software is âa formal math reasoner based on a general coding agentâ. It has a few key components: Lean-LSP-MCP: Software to allow AI agents to interact with the Lean theorem prover. âempowers models with the capability to deeply comprehend, analyze, and manipulate Lean projectsâ, and gives models a toolset for semantic awareness and interaction, code execution and strategy exploration, and theorem retrieval. LeanDex: Semantic retrieval of related theorems and definitions - basically, a search tool for theorems. Informal Prover: A system which uses Gemini models to generate informal solutions. The most interesting tool of all: Discussion Partner: A tool which âempowers Claude Code with the ability to âseek assistanceâ during Lean formalization: when encountering obstaclesâsuch as proof bottlenecks, dilemmas in strategy selection, or ambiguities in intermediate lemmasâthe primary model can proactively initiate discussions with other LLMsâ. Discovering math together: Along with the Putnam demonstration, the authors also used the software as an active partner in some math work, specifically formalizing Brascamp Lieb (I will not pretend to be able to explain what this means). âOver a period of less than two weeks of intermittent collaboration, the two human experts and the agent completed the formalization of more than 8,000 lines of Lean code. During this process, the agent autonomously introduced approximately 70 new definitions, lemmas, and theorems, illustrating its ability to actively extend the formal library and participate in large-scale, sustained formalization efforts,â the authors write. Why this matters - capability overhangs and AI ecologies: Numina-Lean-Agent neatly demonstrates two important things about contemporary AI: 1) AI systems are far more capable than people think and the creation of some specialized frameworks and tools often lets us elicit dramatically better capabilities from our systems (here, math, but it has been demonstrated in many domains), and 2) the AI ecology writ large is composed of many distinct frontier models and it seems like getting these models to interact with one another can lead to some richness, akin to how consulting different types of people about a single problem can reveal a better answer than just talking to one person. Read more: Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics (arXiv). Find out more at the GitHub page (Numina-Lean-Agent, GitHub). *** The industrialization of cyber espionage is nigh: âŠSome experiments on Opus 4.5 and GPT-5.2 indicate that the cyber environment could be on the cusp of major changes⊠Independent researcher Sean Heelan recently tested out how well Opus 4.5 and GPT-5.2 could generate exploits for a zeroday vulnerability in the QuickJS Javascript interpreter. Both models did very well, and this has major implications for cybersecurity. âWe should prepare for the industrialisation of many of the constituent parts of offensive cyber security. We should start assuming that in the near future the limiting factor on a state or groupâs ability to develop exploits, break into networks, escalate privileges and remain in those networks, is going to be their token throughput over time, and not the number of hackers they employ,â he writes. Caveats: QuickJS is a simple Javascript interpreter relative to the ones in Chrome and Firefox. Therefore, it may be harder for LLMs to employ the more complex and more widely deployed ones - though as with all things in AI, we can expect performance to improve quite rapidly. What does industrialized intrusion mean? âWe are already at a point where with vulnerability discovery and exploit development you can trade tokens for real results,â: he writes. âThe types of problems that you encounter if you want to automate the work of SREs, system admins and developers that manage production networks are conceptually similar to those of a hacker operating within an adversaryâs network.â Thereâs lots of evidence for the above, ranging from things like OpenAIâs Aardvark project (where they find that the more tokens they spend, the more bugs they find), and things like Anthropicâs discovery of an AI-orchestrated hacking system. Why this matters - the cyberworld is about to move at machine speed: My bet is that most parts of cyberoffense and cyberdefense are going to move to running at âmachine speedâ, where humans get taken out of most of the critical loops. This will both increase the frequency of hacking attacks while also dramatically scaling up the effectiveness of any individual human defender or attacker (as they will be scaled by AI systems which work for them). The true wildcard question is whether this turns out to be offense- or defense-dominant - my guess is weâre heading for an era of offense-dominance as itâll take a while for defenses to get deployed. In related news, OpenAI CEO Sam Altman said this week he expects OpenAIâs models will soon reach the âCybersecurity Highâ level on his companyâs preparedness framework - this would mean models were available which âremove existing bottlenecks to scaling cyber operations including by automating end-to-end cyber operations against reasonably hardened targets OR by automating the discovery and exploitation of operationally relevant vulnerabilitiesâ - thanks to Nathan Calvin for pointing this out. Read more: On the Coming Industrialisation of Exploit Generation with LLMs (Sean Heelan blog). *** Economist: AI will be bigger than electricity and semiconductors: âŠAnd itâs therefore worth spending a ton of money to reduce AI risks⊠Stanford economist Charles âChadâ Jones has written a paper which says AI will âlikely be the most important technology we have ever developedâ, and that âautomating intelligence itself arguably has broader effects than electricity or semiconductorsâ. Why take AI seriously? The gist of the paper is that AI represents a massive technological invention which will contribute to economic growth in the future. In the past, major inventions (e.g, electricity, the internet, cars, etc) have all done the same. In fact, counterintuitively, if you look at US GDP growth you find that despite all these prior technological revolutions, GDP has been steadily increasing at about 2% a year for many, many years. Therefore, the baseline scenario is where AI just does this - and then we donât live in too crazy a world. But there is a world where things could be different - where AI worksâŠ
Send this story to anyone â or drop the embed into a blog post, Substack, Notion page. Every play sends rev-share back to Import AI.