AI existential risk probabilities are too unreliable to inform policy

How seriously should governments take the threat of existential risk from AI, given the lack of consensus among researchers? On the one hand, existential risks (x-risks) are necessarily somewhat speculative: by the time there is concrete evidence, it may be too late. On the other hand, governments must prioritize — after all, they don’t worry too much about x-risk from alien invasions. This is the first in a series of essays laying out an evidence-based approach for policymakers concerned about AI x-risk, an approach that stays grounded in reality while acknowledging that there are “unknown unknowns”. In this first essay, we look at one type of evidence: probability estimates. The AI safety community relies heavily on forecasting the probability of human extinction due to AI (in a given timeframe) in order to inform decision making and policy. An estimate of 10% over a few decades, for example, would obviously be high enough for the issue to be a top priority for society. Our central claim is that AI x-risk forecasts are far too unreliable to be useful for policy, and in fact highly misleading. If the two of us predicted an 80% probability of aliens landing on earth in the next ten years, would you take this possibility seriously? Of course not. You would ask to see our evidence. As obvious as this may seem, it seems to have been forgotten in the AI x-risk debate that probabilities carry no authority by themselves. Probabilities are usually derived from some grounded method, so we have a strong cognitive bias to view quantified risk estimates as more valid than qualitative ones. But it is possible for probabilities to be nothing more than guesses. Keep this in mind throughout this essay (and more broadly in the AI x-risk debate). If we predicted odds for the Kentucky Derby, we don’t have to give you a reason — you can take it or leave it. But if a policymaker takes actions based on probabilities put forth by a forecaster, they had better be able to explain those probabilities to the public (and that explanation must in turn come from the forecaster). Justification is essential to legitimacy of government and the exercise of power. A core principle of liberal democracy is that the state should not limit people's freedom based on controversial beliefs that reasonable people can reject. Explanation is especially important when the policies being considered are costly, and even more so when those costs are unevenly distributed among stakeholders. A good example is restricting open releases of AI models. Can governments convince people and companies who stand to benefit from open models that they should make this sacrifice because of a speculative future risk? The main aim of this essay is analyzing whether there is any justification for any of the specific x-risk probability estimates that have been cited in the policy debate. We have no objection to AI x-risk forecasting as an academic activity, and forecasts may be helpful to companies and other private decision makers. We only question its use in the context of public policy. There are basically only three known ways by which a forecaster can try to convince a skeptic: inductive, deductive, and subjective probability estimation. We consider each of these in the following sections. All three require both parties to agree on some basic assumptions about the world (which cannot themselves be proven). The three approaches differ in terms of the empirical and logical ways in which the probability estimate follows from that set of assumptions. Most risk estimates are inductive: they are based on past observations. For example, insurers base their predictions of an individual’s car accident risk on data from past accidents about similar drivers. The set of observations used for probability estimation is called a reference class. A suitable reference class for car insurance might be the set of drivers who live in the same city. If the analyst has more information about the individual, such as their age or the type of car they drive, the reference class can be further refined. For existential risk from AI, there is no reference class, as it is an event like no other. To be clear, this is a matter of degree, not kind. There is never a clear “correct” reference class to use, and the choice of a reference class in practice comes down to the analyst’s intuition. The accuracy of the forecasts depends on the degree of similarity between the process that generates the event being forecast and the process that generated the events in the reference class, which can be seen as a spectrum. For predicting the outcome of a physical system such as a coin toss, past experience is a highly reliable guide. Next, for car accidents, risk estimates might vary by, say, 20% based on the past dataset used — good enough for insurance companies. Further along the spectrum are geopolitical events, where the choice of reference class gets even fuzzier. Forecasting expert Philip Tetlock explains: “Grexit may have looked sui generis, because no country had exited the Eurozone as of 2015, but it could also be viewed as just another instance of a broad comparison class, such as negotiation failures, or of a narrower class, such as a nation-states withdrawing from international agreements or, narrower still, of forced currency conversions.” He goes on to defend the idea that even seeming Black Swan events like the collapse of the USSR or the Arab Spring can be modeled as members of reference classes, and that inductive reasoning is useful even for this kind of event. In Tetlock’s spectrum, these events represent the “peak” of uniqueness. When it comes to geopolitical events, that might be true. But even those events are far less unique than extinction from AI. Just look at the attempts to find reference classes for AI x-risk: animal extinction (as a reference class for human extinction), past global transformations such as the industrial revolution (as a reference class for socioeconomic transformation from AI), or accidents causing mass deaths (as a reference class for accidents causing global catastrophe). Let’s get real. None of those tell us anything about the possibility of developing superintelligent AI or losing control over such AI, which are the central sources of uncertainty for AI x-risk forecasting. To summarize, human extinction due to AI is an outcome so far removed from anything that has happened in the past that we cannot use inductive methods to “predict” the odds. Of course, we can get qualitative insights from past technical breakthroughs as well as past catastrophic events, but AI risk is sufficiently different that quantitative estimates lack the kind of justification needed for legitimacy in policymaking. In Conan Doyle’s The Adventure of the Six Napoleons — spoiler alert! — Sherlock Holmes announces before embarking on a stakeout that the probability of catching the suspect is exactly two-thirds. This seems bewildering — how can anything related to human behavior be ascribed a mathematically precise probability? It turns out that Holmes has deduced the underlying series of events that gave rise to the suspect’s seemingly erratic observed behavior: the suspect is methodically searching for a jewel that is known to be hidden inside one of six busts of Napoleon owned by different people in and around London. The details aren’t too important, but the key is that neither the suspect nor the detectives know which of the six busts it is in, and everything else about the suspect’s behavior is (assumed to be) entirely predictable. Hence the precisely quantifiable uncertainty. The point is that if we have a model of the world that we can rely upon, we can estimate risk through logical deduction, even without relying on past observations. Of course, outside of fictional scenarios, the world isn’t so neat, especially when we want to project far into the future. When it comes to x-risk, there is an interesting exception to the ge…

create your storyflo · everywhere you listen.

AI existential risk probabilities are too unreliable to inform policy