Does the UK’s liver transplant matching algorithm systematically exclude younger patients?

By Arvind Narayanan, Angelina Wang, Sayash Kapoor, and Solon Barocas Predictive algorithms are used in many life-or-death situations. In the paper Against Predictive Optimization, we argued that the use of predictive logic for making decisions about people has recurring, inherent flaws, and should be rejected in many cases. A wrenching case study comes from the UK’s liver allocation algorithm, which appears to discriminate by age, with some younger patients seemingly unable to receive a transplant, no matter how ill. What went wrong here? Can it be fixed? Or should health systems avoid using algorithms for liver transplant matching? The UK nationalized its liver transplant system in 2018, replacing previous regional systems where livers were prioritized based on disease severity.1 When a liver becomes available, the new algorithm uses predictive logic to calculate how much each patient on the national waiting list would benefit from being given that liver. Specifically, the algorithm predicts how long each patient would live if they were given that liver, and how long they would live if they didn’t get a transplant. The difference between the two is the patient’s Transplant Benefit Score (TBS). Patients are sorted in decreasing order of the score, and the top patient is offered the liver (if they decline, the next patient is offered, and so on). Given this description, one would expect that the algorithm would favor younger patients, as they will potentially gain many more decades of life through a transplant compared to older patients. If the algorithm has the opposite effect, either the score has been inaccurately portrayed or it is being calculated incorrectly. We’ll see which one it is. But first, let’s discuss a more basic question. Discussions of the ethics of algorithmic decision making often narrowly focus on bias, ignoring the question of whether it is legitimate to use an algorithm in the first place. For example, consider pretrial risk prediction in the criminal justice system. While bias is a serious concern, a deeper question is whether it is morally justified to deny defendants their freedom based on a prediction of what they might do rather than a determination of guilt, especially when that prediction is barely more accurate than a coin flip. Organ transplantation is different in many ways. The health system needs to make efficient and ethical use of a very limited and valuable resource, and must find some principled way of allocating it to many deserving people, all of whom have reasonable claims for why they should be entitled to it. There are thousands of potential recipients, and decisions must be made quickly when an organ becomes available. Human judgment doesn’t scale.2 Another way to try to avoid the need for predictive algorithms is to increase the pool of organs so that they are no longer as scarce. Encouraging people to sign up for organ donation is definitely important. But even if the supply of livers is no longer a constraint, it would still be useful to predict which patient will benefit the most from a specific liver. Sometimes simple statistical formulas provide most of the benefits of predictive AI without the downsides. In fact, the previous liver transplant system in the UK was based on a relatively simple formula for predicting disease severity, called the UK End-stage Liver Disease score, which is based on the blood levels of a few markers. The new system takes into account the benefit of transplantation in addition to disease severity. It is also more of a black box. It is “AI” in the sense that it is derived from a data-driven optimization process and is too complex to be mentally understood by doctors or patients. It uses 28 variables from the donor and recipient to make a prediction. It seems at least plausible that this complexity is justified in this context because health outcomes are much more predictable than who will commit a crime (though this varies by disease). Follow-up studies have confirmed that the matching algorithm does indeed save more lives than the system that it replaced. So there isn’t necessarily a prima facie case for arguing against the use of the algorithm. Instead, we have to look at the details of what went wrong. Let’s turn to those details. In November 2023, the Financial Times published a bombshell investigation about bias in the algorithm. It centers on a 31 year old patient, Sarah Meredith, with multiple genetic conditions including cystic fibrosis. It describes her accidental discovery that the Transplant Benefit Score algorithm even existed and would decide her fate; her struggle to understand how it worked; her liver doctors’ lack of even basic knowledge about the algorithm; and her realization that there was no physician override to the TBS score and no appeals process. When she reached out to the National Health Service to ask for explanations, Meredith was repeatedly told she wouldn’t understand. It seems that the paternalism of health systems combined with the myth of the inscrutability of algorithms is a particularly toxic mix. Meredith eventually landed on a web app that calculates the TBS, built by Professor Ewen Harrison and his team. He is a surgeon and data scientist who has studied the TBS, and is a co-author of a study of some of the failures of the algorithm. It is through this app that Meredith realized how biased the algorithm is. It also shows why the inscrutability of algorithmic decision making is a myth: even without understanding the internals, it is easy to understand the behavior of the system, especially given that a particular patient only cares about how the system behaves in one specific instance. But this isn’t just one patient’s experience. From the Financial Times piece: “If you’re below 45 years, no matter how ill, it is impossible for you to score high enough to be given priority scores on the list,” said Palak Trivedi, a consultant hepatologist at the University of Birmingham, which has one of the country’s largest liver transplant centres. Finally, a 2024 study in The Lancet has confirmed that the algorithm has a severe bias against younger patients.3 The objective of the matching system is to identify the recipient whose life expectancy would be increased the most through the transplant. The obvious way to do this is to predict each patient’s expected survival time with and without the transplant. This is almost what the algorithm does, but not quite — it predicts each patient’s likelihood of surviving 5 years with and without the transplant. The problem with this is obvious. A patient group gave this feedback through official channels in 2015, long before the algorithm went into effect: Capping survival at five years in effect diminishes the benefits for younger patients as it underestimates the gain in life years by predicting lifetime gain over 5 years, as opposed to the total lifetime gain. Paediatric and small adult patients benefit from accessing small adult livers as a national priority in the Current System. However, young adults must compete directly with all other adult patients. In the proposed model, there is no recognition that a death in a younger patient is associated with a greater number of expected years of life lost compared with the death of an older adult patient. There is also no recognition that longer periods waiting has an impact on younger patients’ prospects, such as career and family, and contribution to society compared with older adult patients. Younger patients have not yet had the chance to live their lives and consideration should be given to how the cohort of younger waiting list patients is affected by rules applied to calculate their benefit. This is what leads to the algorithm’s behavior. Younger patients are (correctly) predicted to be more likely to survive 5 years without a transplant, and about as likely as older patients to survive 5 years with a transplant. So younger patients’ predicted net benefi…

create your storyflo · everywhere you listen.

Does the UK’s liver transplant matching algorithm systematically exclude younger patients?