September 8, 2017
Many definitions
Trade-offs
Algorithms for achieving them
Impossibility results
$X$ features of an individual
$A$ sensitive attribute (race, gender, ...)
$C=C(X,A)$ classifier mapping $X$ and $A$ to some prediction
$Y$ actual outcome
Note: random variables in the same probability space
Notation: $\mathbb{P}_a\{E\}=\mathbb{P}\{E\mid A=a\}.$
$X$ incorporates all sorts of measurement biases
$A$ often not even known, ill-defined, misreported, inferred
$C$ often not well defined, e.g., large production ML system
$Y$ often poor proxy of actual variable of interest
Anything you can write down as a probability statement involving $X, A, C, Y.$
What can we learn from observational criteria?
How can we achieve them algorithmically?
How do these criteria trade-off?
How do these criteria shape public discourses?
When $C$ is binary $0/1$-variables, this means
$\mathbb{P}_a\{C = 1\} = \mathbb{P}_b\{C = 1\}$ for all groups $a,b.$
Approximate versions:
Accept random people in group $a$ but qualified people in group $b.$
Can happen for lack of training data (recall sample size disparity)
Rules out perfect classifier $C=Y$ when base rates are different: $\mathbb{P}_a\{C=1\}\ne\mathbb{P}_b\{C=1\}.$
Discourages laziness by equalizing error rates across groups.
Allows perfect predictor $C=Y.$
Error types matter!
Allows you to make up for rejecting qualified women by accepting unqualified men.
Assume $C$ and $Y$ are binary $0/1$-variables.
Called equal opportunity in H-Price-Srebro (2016).
Suitable when positive outcome ($1$) is desirable.
Equivalently, primary harm is due to false negatives.
Assume $C$ and $Y$ are binary $0/1$-variables.
TPP+FPP together called equalized odds in H-Price-Srebro (2016). We'll also call it positive rate parity.
In full generality, random variable $R$ satisifes equalized odds if $R$ is conditionally independent of $A$ given $Y.$
Assume $C$ and $Y$ are binary $0/1$-variables.
Equalizes chance of success given acceptance.
Probublica's main charge was observational.
Black defendants experienced higher false positive rate.
Northpointe's main defense was observational.
Scores satisfy precision parity.
Observed in very similar form by Chouldechova (2017).
Similar trade-off result for score functions due to Kleinberg, Mullainathan, Raghavan (2016)
Group | a | b | |
Outcome | Unequal base rates |
||
Predictor |
So far, predictor is perfect.
Let's introduce an error.
Group | a | b | |
Outcome | Unequal base rates |
||
Predictor |
But this doesn't satisfy false positive parity!
Let's fix that!
Group | a | b | |
Outcome | Unequal base rates |
||
Predictor |
Satisfies positive rate parity!
Group | a | b | |
Outcome | Unequal base rates |
||
Predictor | |||
NPV | 2/5 | 1/3 |
Does not satisfy predictive value parity!
Proof by googling the first Wiki entry on this:
$\mathrm{PPV_a} = \frac{\mathrm{TPR}p_a}{\mathrm{TPR}p_a+\mathrm{FPR}(1-p_a)}$
Hence, $\mathrm{PPV}_0=\mathrm{PPV}_1$ implies
either $\mathrm{TPR}=0$ or $\mathrm{FPR}=0.$
(But not both, since $C\ne Y$)
$\mathrm{NPV_a} = \frac{(1-\mathrm{FPR})(1-p_a)}{(1-\mathrm{TPR})p_a+(1-\mathrm{FPR})(1-p_a)}$
In either case, $\mathrm{NPV}_0\ne \mathrm{NPV}_1.$ Hence predictive value parity fails.
Formally, any real-valued random variable $R\in[0,1]$ in the same probability space as $(X,A,Y)$.
Leads to family of binary classifiers by thresholding $C=\mathbb{I}\{R>t\}, t\in[0,1]$.
Different thresholds give different trade-offs between true and false positive rate.
Goal: Find $R$ that minimizes $\mathbb{E}(Y-R(X,A))^2$
Solution: $R=r(X, A),$ where $r(x, a)=\mathbb{E}[Y\mid X=x, A=a].$
Given score $R$, plot (TPR, FPR) for all possible thresholds
Given score $R$ and cost $c=(c_{fn}, c_{fp}),$
derive classifier $C=F(R, A)$ that minimizes cost
Construction on the board. See paper for details.
An optimal equalized odds classifier can be derived
from the Bayes optimal regressor.
Note: Approximate version of the theorem holds as well. See paper.
Based on sample of 301536 TransUnion TransRisk scores from 2003
(made
available by US Federal Reserve)
$R$ — Credit score ranging from 300 to 850
$Y$ — Default
Fineprint: failed to pay debt for at least $90$
days on at least one account
$A$ — Race (Asian, Black, Hispanic, White)