August 25, 2017
This is a CS grad seminar. Prereqs: Stats, ML, probability, linear algebra
Web site: fairmlclass.github.io
Online dicussions on Slack:
Grade: 50% in-class participation, 50% project
Not yet enrolled? Talk to me after class today.
Part I: Sources of unfairness
Part II: Observational fairness criteria
Part III: Beyond observational fairness criteria
Part IV: Measurement and sampling
Part V: Legal and policy perspectives
Develop a better understanding of a complex social problem that will allow us to contribute to a meaningful technical discussion.
You don't mind reading social science papers
You'd still be here even if the chance of getting a paper out of it is 0.
How can machine learning wind up
being unfair without any explicit wrongdoing?
Generally, more data means smaller error
By definition, less data on minority groups.
Can lead to higher error rates on minority.
Two classifiers with 5% average error:
Collection:
Measurement:
Pre-existing biases
We'll discuss:
Barocas, Selbst. Big Data's Disparate Impact
Before we meet:
Skim the whole thing. Choose one part to read very carefully.
Many definitions
Algorithms for achieving them
Trade-offs
Impossibility results
$X$ features of an individual
$A$ sensitive attribute (race, gender, ...)
$C=C(X,A)$ classifier mapping $X$ and $A$ to some prediction
$Y$ actual outcome
Note: random variables in the same probability space
$X$ incorporates all sorts of measurement biases
$A$ often not even known, ill-defined, misreported, inferred
$C$ often not well defined, e.g., large production ML system
$Y$ often poor proxy of actual variable of interest
Assume $C$ and $A$ are binary $0/1$-variables.
Definition.
Classifier $C$ satisfies demographic parity if
$\mathbb{P}\{ C = 1 \mid A = 1 \} = \mathbb{P}\{ C = 1 \mid A = 0 \}$.
Assume $A$ is binary $0/1$-variable.
Definition.
Classifier $C$ satisfies accuracy parity if
$\mathbb{P}\{ C = Y \mid A = 1 \} = \mathbb{P}\{ C = Y \mid A = 0 \}$.
Assume $C$, $Y$ and $A$ are binary $0/1$-variables.
Definition.
Classifier $C$ satisfies precision parity if
$\mathbb{P}\{ Y = 1 \mid C=1, A = 1 \} = \mathbb{P}\{ Y = 1\mid C=1, A = 0 \}$.
Assume $C$, $Y$ and $A$ are binary $0/1$-variables.
Definition.
Classifier $C$ satisfies true positive parity if
$\mathbb{P}\{ C = 1 \mid Y=1, A = 1 \} = \mathbb{P}\{ C = 1\mid Y=1, A = 0 \}$.
Definition. A criterion is called observational if it is a property of the joint distribution of features $X,A$, classifier $C$, and outcome $Y$.
Examples: Everything we just saw, and many others.
What can we learn from observational criteria?
How can we achieve them algorithmically?
How do they trade-off?
How do these criteria shape public discourses?
Key example: COMPAS debate on crime recidivism risk scores
Probublica's main charge was observational.
Black defendants experienced higher false positive rate.
Northpointe's main defense was observational.
Scores satisfy precision parity.
A classifier $C$ cannot simultaneously achieve (a) precision parity, (b) true positive parity, and (c) false positive parity unless:
Due to Kleinberg, Mullainathan, Raghavan (2016), and Chouldechova (2016), although stated somewhat differently.
There are two scenarios with identical joint distributions,
but completely different interpretations for fairness.
In particular, no observational definition
can distinguish the two scenarios.
Due to H, Price, Srebro (2016)
Causality
Deep dive into causal graphs, causal inference, interventions, matching
Develop causal fairness criteria (definitions, algorithms, trade-offs, ...)
Relationship to similarity-based fairness notions ("individual fairness")
Measurement theory, sampling theory
Developing awareness of pitfalls
Understand data-generating processes better
Understand legal challenges technical work faces
Think through possibility of policy recommendations