AGI is not a milestone

With the release of OpenAI’s latest model o3, there is renewed debate about whether Artificial General Intelligence has already been achieved. The standard skeptic’s response to this is that there is no consensus on the definition of AGI. That is true, but misses the point — if AGI is such a momentous milestone, shouldn’t it be obvious when it has been built? In this essay, we argue that AGI is not a milestone. It does not represent a discontinuity in the properties or impacts of AI systems. If a company declares that it has built AGI, based on whatever definition, it is not an actionable event. It will have no implications for businesses, developers, policymakers, or safety. Specifically: Even if general-purpose AI systems reach some agreed-upon capability threshold, we will need many complementary innovations that allow AI to diffuse across industries to realize its productive impact. Diffusion occurs at human (and societal) timescales, not at the speed of tech development. Worries about AGI and catastrophic risk often conflate capabilities with power. Once we distinguish between the two, we can reject the idea of a critical point in AI development at which it becomes infeasible for humanity to remain in control. The proliferation of AGI definitions is a symptom, not the disease. AGI is significant because of its presumed impacts but must be defined based on properties of the AI system itself. But the link between system properties and impacts is tenuous, and greatly depends on how we design the environment in which AI systems operate. Thus, whether or not a given AI system will go on to have transformative impacts is yet to be determined at the moment the system is released. So a determination that an AI system constitutes AGI can only meaningfully be made retrospectively. Achieving AGI is the explicit goal of companies like OpenAI and much of the AI research community. It is treated as a milestone in the same way as building and delivering a nuclear weapon was the key goal of the Manhattan Project. This goal made sense as a milestone in the Manhattan Project for two reasons. The first is observability. In developing nuclear weapons, there can be no doubt about whether you’re reached the goal or not — an explosion epitomizes observability. The second is immediate impact. The use of nuclear weapons contributed to a quick end to World War 2. It also ushered in a new world order — a long-term transformation of geopolitics. Many people have the intuition that AGI will have these properties. It will be so powerful and humanlike that it will be obvious when we’ve built it. And it will immediately bring massive benefits and risks — automation of a big swath of the economy, a great acceleration of innovation, including AI research itself, and potentially catastrophic consequences for humanity from uncontrollable superintelligence. In this essay, we argue that AGI will be exactly the opposite — it is unobservable because there is no clear capability threshold that has particular significance; it will have no immediate impact on the world; and even a long-term transformation of the economy is uncertain. In previous essays, we have argued against the likely disastrous policy interventions that some have recommended by analogizing AGI to nuclear weapons. It is striking to us that this analogy reliably generates what we consider to be incorrect predictions and counterproductive recommendations. Many prominent AI commentators have called o3 a kind of AGI: Tyler Cowen says that if you know AGI when you see it, then he has seen it. Ethan Mollick describes o3 as a jagged AGI. What is it about o3 that has led to such excitement? The key innovation in o3 is the use of reinforcement learning to learn to search the web and use tools as part of its reasoning chain.1 In this way, it can perform more complex cognitive tasks than LLMs are directly capable of, and can do so in a way that’s similar to people. Consider a person doing comparison shopping. They might look at a few products, use the reviews of those products to get a better sense of what features are even important, and use that knowledge to iteratively expand or shrink the set of products being considered. o3 is a generalist agent that does a decent job at this sort of thing. Let’s consider what this means for AGI. To avoid getting bogged down in the details of o3, imagine a future system whose architecture is identical to o3, but is much more competent. For example, it can always find the right webpages and knowledge for the task as long as it’s online, no matter how hard it is to locate. It can download and run code from the internet to solve a task if necessary. None of these require scientific breakthroughs, only engineering improvements and further training. At the same time, without scientific improvements, the architecture imposes serious limits. For example, this future system cannot acquire new skills from experience, except through an explicit update to its training. Building AI systems that can learn on the fly is an open research problem.2 Would our hypothetical system be AGI? Arguably, yes. What many AGI definitions have in common is the ability to outperform humans at a wide variety of tasks. Depending on how narrowly the set of tasks is defined and how broadly the relevant set of humans for each task is defined, it is quite plausible that this future o3-like model/agent will meet some of these AGI definitions. For example, it will be superhuman at playing chess, despite the fact that large language models themselves are at best mediocre at chess. Remember that the model can use tools, search the internet, and download and run code. If the task is to play chess, it will download and run a chess engine. Despite human-level or superhuman performance at many tasks, and plausibly satisfying some definitions of AGI, it will probably fail badly at many real-world tasks. We’ll get back to the reasons for that. Does any of this matter? It does. Leaders at AI companies have made very loud predictions and commitments to delivering AGI within a few years. There are enormous incentives for them to declare some near-future system to be AGI, and potentially enormous costs to not doing so. Perhaps some of the valuation of AI companies is based on these promises, so without AGI there might be a bubble burst. Being seen as leaders in AI development could help improve market share and revenues, and improve access to talent. So, if and when companies claim to have built AGI, what will be the consequences? We'll analyze that in the rest of this essay. One argument for treating AGI as a milestone — and taking declarations of AGI seriously — is that AGI could lead to rapid economic impacts, both positive and negative, such as a world without scarcity, an end to the concept of money, or sudden mass joblessness. But AI's economic impact is only realized when it is adopted across the economy. Technical advances are necessary, but not sufficient, to realize this impact. For past general-purpose technologies, such as electricity, computing, and the internet, it took decades for the underlying technical advances to diffuse across society. The miracle of the Industrial Revolution wasn't the high growth rate — annual growth rates averaged below 3% — but the sustained period of decades of growth. There are many bottlenecks to the diffusion of AI: developing useful products and applications, training the workforce to utilize these products, implementing organizational changes to enable AI use, and establishing laws and norms that facilitate AI adoption by companies. Like past general-purpose technologies, we expect the economic impacts of AI to be realized over decades, as this process of diffusion unfolds. In the paper AI as Normal Technology, we present a detailed argument for why we think this will be the case. The idea that rapid increases in capability lead to rapid economic impacts is completely inconsistent with the pa…

create your storyflo · everywhere you listen.

AGI is not a milestone