You Cannot Have All Three: The Fairness Impossibility Theorem

Fri, 08 Mar 2024 00:00:00 +0000

Summary

In 2016 ProPublica published an investigation showing that COMPAS — a widely used recidivism risk assessment tool — assigned higher risk scores to Black defendants than to White defendants with equivalent actual recidivism rates. The tool’s developer responded that COMPAS is well-calibrated: among defendants of any race assigned a given score, the subsequent recidivism rates are consistent with that score. Both claims were correct.

The apparent contradiction between them is resolved by a mathematical result that was proved independently by two groups the same year. The fairness impossibility theorem establishes that calibration, equal false positive rates, and equal false negative rates cannot all hold simultaneously when base rates differ between groups — unless the classifier is perfect.

This is not a property of COMPAS specifically. It is not fixed by a better algorithm, more diverse training data, or more careful engineering. It is a constraint that holds for any probabilistic classifier operating on groups with unequal prevalence of the predicted outcome.

The question this forces is not “how do we make the algorithm fair?” The question is “which fairness criterion do we endorse, and can we defend that choice to the people it disadvantages?” That is not a technical question.

The COMPAS Investigation

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner published “Machine Bias” in ProPublica on 23 May 2016 (Angwin et al., 2016). They had obtained COMPAS risk scores for approximately 7,000 defendants in Broward County, Florida, along with actual two-year recidivism data. Their finding: among defendants who did not go on to reoffend, Black defendants were falsely labelled high-risk at roughly twice the rate of White defendants. The false positive rate was substantially higher for Black defendants.

Northpointe (now Equivant), the tool’s developer, responded that ProPublica’s analysis was misleading. COMPAS is calibrated: within any given score band, the actual recidivism rate is the same regardless of race. A score of 7 means approximately the same thing for a Black defendant as for a White defendant. This is a genuine and important property for a risk assessment to have.

Both analyses were conducted correctly. The tension between them is not a matter of one side being wrong. It is a matter of two legitimate fairness criteria being simultaneously satisfied being mathematically impossible.

Three Definitions of Fairness

Let $Y \in \{0, 1\}$ be the true outcome (reoffend/not), $\hat{Y}$ be the classifier’s prediction, and $A \in \{0, 1\}$ indicate group membership.

Calibration (predictive parity): for all score values $s$,

$$P(Y = 1 \mid \hat{Y} = s, A = 0) = P(Y = 1 \mid \hat{Y} = s, A = 1)$$

If the model assigns a score of 7 to a defendant, the actual reoffending rate should be the same regardless of race. This is what COMPAS satisfies.

False positive rate parity:

$$P(\hat{Y} = 1 \mid Y = 0, A = 0) = P(\hat{Y} = 1 \mid Y = 0, A = 1)$$

Among defendants who will not reoffend, the probability of being incorrectly labelled high-risk should be equal across groups. This is what ProPublica found violated.

False negative rate parity:

$$P(\hat{Y} = 0 \mid Y = 1, A = 0) = P(\hat{Y} = 0 \mid Y = 1, A = 1)$$

Among defendants who will reoffend, the probability of being incorrectly labelled low-risk should be equal across groups.

All three properties seem like reasonable things to ask of a fair classifier. The impossibility theorem says you cannot have all three at once — with a precise exception.

The Impossibility Theorem

Alexandra Chouldechova proved the relevant result in 2017 using Broward County data as her case study (Chouldechova, 2017). Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan proved an equivalent result independently (Kleinberg et al., 2017).

The argument is straightforward. Suppose a classifier is calibrated and produces a binary prediction (high/low risk). Let $p_0$ and $p_1$ be the base rates — the actual reoffending rates — in groups 0 and 1. For a binary classifier with positive predictive value PPV and negative predictive value NPV:

The false positive rate satisfies (via Bayes): $\text{FPR} = \frac{\text{TPR} \cdot \text{PR} \cdot (1-\text{PPV})}{\text{PPV} \cdot (1-\text{PR})}$ where PR is prevalence and TPR is sensitivity
The false negative rate satisfies (via Bayes): $\text{FNR} = \frac{\text{TNR} \cdot (1-\text{PR}) \cdot (1-\text{NPV})}{\text{NPV} \cdot \text{PR}}$ where TNR is specificity

If calibration holds — PPV and NPV are equal across groups — and the base rates $p_0 \neq p_1$, then the FPR and FNR in each group are functions of that group’s specific base rate. They cannot both be equalized across groups unless either:

$p_0 = p_1$: the base rates are equal, or
The classifier is perfect: FPR = FNR = 0.

In the real case — unequal base rates, imperfect classifier — calibration and equalized error rates are mutually exclusive. You can have one or the other but not both. The three criteria have two degrees of freedom, and the third is determined by the first two plus the base rates. It is an algebraic constraint, not an engineering limitation.

A Structural Analogy

The structural similarity to another impossibility result is worth noting.

The Robertson inequality in quantum mechanics (Robertson, 1929) states that for any two observables $\hat{A}$ and $\hat{B}$:

$$\Delta A \cdot \Delta B \geq \frac{1}{2} \left| \langle [\hat{A}, \hat{B}] \rangle \right|$$

This is not an engineering failure. It is a consequence of the algebraic structure of the theory: if $[\hat{A}, \hat{B}] \neq 0$, then $\Delta A$ and $\Delta B$ cannot simultaneously be made arbitrarily small. No measurement apparatus, however precise, can violate it. The constraint is in the mathematics, not the hardware.

The fairness impossibility has the same character. Three desiderata, a structural constraint that prevents simultaneous satisfaction, and no algorithmic escape route. A better model does not help. Richer training data does not help. The constraint is in the arithmetic of conditional probabilities and base rates.

The disanalogy is this: in quantum mechanics, $\hbar$ is a fundamental constant — you cannot reduce it. In fairness, the base rates are not constants of nature. They are historical outcomes of social processes: incarceration rates, policing patterns, economic conditions, educational access. The theorem does not tell you that unequal base rates are acceptable; it tells you that given unequal base rates, the three fairness criteria cannot all be satisfied.

Gender Bias in AI Systems

The impossibility theorem applies to any binary classification setting with unequal base rates. The empirical landscape of AI gender bias gives several concrete instances where one criterion was satisfied while others were not.

In October 2018, Reuters reported that Amazon had developed and then abandoned an internal AI-based recruiting tool that systematically downgraded résumés from women (Dastin, 2018). The model had been trained on a decade of hiring decisions, in which successful hires were predominantly male. The model learned that “male” features were associated with success and penalized female indicators accordingly. Calibration to the training data produced systematic gender bias in output.

Tolga Bolukbasi and colleagues showed in 2016 that word embeddings trained on large text corpora encoded gender stereotypes in their geometric structure (Bolukbasi et al., 2016). The analogy $\text{man} : \text{computer programmer} :: \text{woman} : \text{homemaker}$ could be recovered directly from the vector arithmetic of the embedding space. The embedding was calibrated to the text corpus, which reflected the occupational distribution of the time — and perpetuated it.

Jieyu Zhao and colleagues found that image captioning and activity recognition models amplified existing gender associations (Zhao et al., 2017). “Cooking” was associated with women in 67% of training images; the models amplified that to 84% at inference. The amplification is a consequence of models learning the easiest features that predict the label — and in a world where cooking is disproportionately female, “female appearance” becomes a feature that predicts “cooking.”

Joy Buolamwini and Timnit Gebru’s “Gender Shades” study found error rates of up to 34.7% for darker-skinned women in commercial facial recognition systems, compared to 0.8% for lighter-skinned men (Buolamwini & Gebru, 2018). The classifiers were calibrated on predominantly light-skinned training data. Calibration on the majority group produced large errors on the minority group — exactly the pattern the impossibility theorem describes.

Hadas Kotek and colleagues tested four large language models on gender-stereotyped occupational prompts in 2023 (Kotek et al., 2023). The models were three to six times more likely to choose the gender-stereotyped occupation when responding to ambiguous prompts. The models were calibrated to human-generated text; human-generated text encodes human stereotypes.

The Solutions and Their Limits

Three broad approaches exist to algorithmic debiasing, and all three face the same constraint.

Pre-processing removes bias from training data before training. Zemel and colleagues proposed “Learning Fair Representations” — a latent embedding that encodes the data usefully while obscuring group membership (Zemel et al., 2013). This can reduce bias in the learned representation, but it cannot simultaneously satisfy all three fairness criteria; it trades one against another by compressing the group-informative dimensions.

Post-processing adjusts the classifier’s decisions after training. Moritz Hardt, Eric Price, and Nathan Srebro’s equalized odds approach (Hardt et al., 2016) adjusts decision thresholds separately per group to achieve FPR/FNR parity. This satisfies equalized odds by construction — but only by abandoning calibration, which the Chouldechova theorem requires when base rates differ.

In-processing incorporates a fairness constraint into the training objective. Agarwal and colleagues proposed a reductions approach that allows the practitioner to specify which fairness constraint to impose (Agarwal et al., 2018). But you must choose. The algorithm can optimize for any one of the three criteria; it cannot optimize for all three simultaneously when base rates differ.

A 2021 survey by Mitchell and colleagues confirms that all three paradigms face the same impossibility (Mitchell et al., 2021). The choice of paradigm is a choice about which criterion to prioritize, and that choice has distributional consequences that fall differently on different groups.

The Political Choice

This is where Arvind Narayanan’s framing becomes essential. His 2018 tutorial catalogued 21 distinct definitions of algorithmic fairness and titled it “21 Fairness Definitions and Their Politics” (Narayanan, 2018). The title is the argument: the definitions are not equivalent, choosing among them is not a technical decision, and the choice encodes a prior about what justice requires.

In the criminal justice context: a false positive (predicting recidivism when the defendant will not reoffend) imposes a cost on the defendant — higher bail, longer sentence, restricted conditions of release. A false negative (predicting non-recidivism when the defendant will reoffend) imposes a cost on potential future victims and on public safety. When we choose to minimize FPR parity, we are choosing to protect defendants from false accusation. When we choose to minimize FNR parity, we are choosing to protect the public from missed offenders. These are both defensible values. They produce different error distributions across groups.

Choosing overall accuracy as the metric — which is what maximizing predictive performance typically means — is itself a value choice: it implicitly weights errors by their frequency in the population, which means errors made on less-common outcomes are relatively under-penalized. When racial disparities in base rates are products of historical injustice, this choice compounds that injustice.

Solon Barocas, Moritz Hardt, and Arvind Narayanan’s textbook Fairness and Machine Learning (2023) makes explicit that the choice between fairness criteria is a normative, not technical, decision (Barocas et al., 2023). The book does not tell you which criterion to choose. It tells you that you must choose, that the choice has political content, and that presenting it as a technical optimization problem conceals that content.

Reuben Binns’ analysis through political philosophy confirms that different fairness criteria correspond to different underlying theories of justice: Rawlsian, Dworkinian, luck egalitarian framings all generate different orderings of the three criteria (Binns, 2018). The choice of fairness criterion is the choice of a theory of justice, whether or not the engineers implementing the system have thought of it in those terms.

The Theorem Is Not the Problem

I want to be clear about what the impossibility theorem does and does not say.

It does not say that algorithmic fairness is impossible. It says that you must choose among competing fairness criteria when base rates differ across groups, and that the choice has distributional consequences. Systems can be built that satisfy calibration, or equalized odds, or demographic parity — just not all three at once with unequal base rates.

It does not say that base rate disparities are natural or acceptable. The disparities in recidivism rates, hiring rates, image training sets, and text corpora are products of social history. The theorem constrains what a classifier can do given those disparities; it does not prescribe them.

What it does say is that “we built a fair algorithm” is not a statement that can be made without specifying which fairness criterion was satisfied and which was not. It is not a statement that can be defended on purely technical grounds. And it is not a statement that escapes political accountability by hiding behind mathematical precision.

The fairness debate in AI is, at its core, a debate about which errors we are willing to make, in whom, with what consequences. The theorem makes that debate unavoidable. Whether we have the vocabulary and the will to conduct it in those terms is a different question entirely.

References

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153–163. DOI: 10.1089/big.2016.0047
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). Inherent trade-offs in the fair determination of risk scores. In Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). DOI: 10.4230/LIPIcs.ITCS.2017.43
Robertson, H. P. (1929). The uncertainty principle. Physical Review, 34, 163–164. DOI: 10.1103/PhysRev.34.163
Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems 29 (NeurIPS 2016). arXiv:1607.06520
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In Proceedings of EMNLP 2017, pp. 2979–2989. ACL Anthology: D17-1323
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT* 2018), PMLR Vol. 81, pp. 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html
Kotek, H., Dockum, R., & Sun, D. Q. (2023). Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference (CI ‘23), pp. 12–24. DOI: 10.1145/3582269.3615599
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., & Dwork, C. (2013). Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning (ICML 2013), PMLR Vol. 28, No. 3, pp. 325–333. https://proceedings.mlr.press/v28/zemel13.html
Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems 29 (NeurIPS 2016), pp. 3323–3331. arXiv:1610.02413
Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., & Wallach, H. (2018). A reductions approach to fair classification. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), PMLR Vol. 80, pp. 60–69. arXiv:1803.02453
Mitchell, S., Potash, E., Barocas, S., D’Amour, A., & Lum, K. (2021). Algorithmic fairness: Choices, assumptions, and definitions. Annual Review of Statistics and Its Application, 8, 141–163. DOI: 10.1146/annurev-statistics-042720-125902
Narayanan, A. (2018). 21 Fairness Definitions and Their Politics. Tutorial at FAT* 2018. PDF
Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press. https://fairmlbook.org
Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. In Proceedings of the 2018 Conference on Fairness, Accountability, and Transparency (FAT* 2018), PMLR Vol. 81, pp. 149–159. arXiv:1712.03586

Changelog

2025-11-05: Updated the Zhao et al. (2017) cooking statistics to match the paper: 67% female agents for cooking in the training set (33% was the male share), amplified to 84% female at inference.

COMPAS on Sebastian Spicker