Ch 13 — Ethics & Bias in AI

Fairness, accountability, transparency, and the societal impact of AI
High Level
warning
Bias
arrow_forward
gavel
Cases
arrow_forward
balance
Fairness
arrow_forward
visibility
XAI
arrow_forward
policy
Regulation
arrow_forward
shield
Safety
-
Click play or press Space to begin the journey...
Step- / 8
warning
Sources of AI Bias
Where bias enters the ML pipeline
What Is Algorithmic Bias?
Algorithmic bias occurs when an AI system produces systematically unfair outcomes for certain groups. It’s not a bug in the code — it’s a reflection of biased data, flawed assumptions, or misaligned objectives. ML models learn patterns from historical data, and if that data reflects societal inequalities, the model reproduces and amplifies them.
Sources of Bias in the ML Pipeline: 1. Historical bias Data reflects past discrimination (e.g., fewer women in tech hiring data) 2. Representation bias Training data doesn’t represent all groups (e.g., facial recognition trained mostly on light-skinned faces) 3. Measurement bias Proxy variables encode protected attributes (e.g., zip code correlates with race) 4. Aggregation bias One model for diverse populations (e.g., medical AI trained on one ethnicity) 5. Evaluation bias Benchmarks don’t test for fairness (e.g., accuracy is high overall but low for minority subgroups) 6. Deployment bias System used outside intended context
The Feedback Loop Problem
Biased AI decisions create new biased data. Predictive policing sends more officers to minority neighborhoods → more arrests there → data shows more crime there → model sends even more officers. The system amplifies the original bias rather than correcting it.
Key insight: Removing protected attributes (race, gender) from features doesn’t fix bias. Other features (zip code, name, browsing history) are correlated with protected attributes. This is called redundant encoding — the model finds proxies. Fairness requires active intervention, not just feature removal.
gavel
Real-World Bias Cases
When AI systems caused measurable harm
COMPAS (Criminal Justice, 2016): Predicted recidivism risk for sentencing ProPublica found: Black defendants falsely flagged as high-risk at 2x the rate of white defendants. White defendants falsely flagged as low-risk at higher rates. Used in courts across the US Amazon Hiring Tool (2014–2018): Trained on 10 years of resumes (mostly male) Penalized resumes with “women’s” (e.g., “women’s chess club captain”) Downgraded all-women’s college graduates Amazon scrapped the project entirely Healthcare Algorithm (2019): Used health spending as proxy for health need Black patients spend less due to systemic barriers → algorithm rated them as healthier than equally sick white patients Affected ~200 million patients (Obermeyer)
Facial Recognition (Buolamwini, 2018): “Gender Shades” study tested 3 commercial systems (Microsoft, IBM, Face++) Error rates: Light-skinned males: 0.8% Dark-skinned females: 34.7% 43x higher error for dark-skinned women Google Photos (2015): Auto-tagged Black users as “gorillas” Google’s fix: removed the “gorilla” label Still not properly fixed years later LLM Bias (Ongoing): GPT models associate certain professions with specific genders/races “The doctor... he” vs “The nurse... she” Stereotypes embedded in training data
Pattern: In every case, the system worked well for the majority group and failed for minorities. High overall accuracy masked severe disparities. This is why disaggregated evaluation (measuring performance per subgroup) is essential.
balance
Fairness Definitions & Tensions
What does “fair” mean? It depends on who you ask
Demographic Parity: P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1) Equal selection rates across groups “Hire 50% from each group” Equalized Odds: P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1) P(Ŷ=1 | Y=0, A=0) = P(Ŷ=1 | Y=0, A=1) Equal TPR and FPR across groups “Same accuracy for everyone” Calibration: P(Y=1 | Ŷ=p, A=0) = P(Y=1 | Ŷ=p, A=1) = p When model says 70% risk, it’s 70% for all “Scores mean the same thing” Individual Fairness: Similar individuals get similar predictions “Treat like cases alike”
The Impossibility Theorem
Chouldechova (2017) and Kleinberg et al. (2016) proved that except in trivial cases, you cannot satisfy demographic parity, equalized odds, and calibration simultaneously when base rates differ between groups. If Group A reoffends at 30% and Group B at 50%, a calibrated model must flag Group B more often — violating demographic parity.
This means fairness is a choice, not a formula. Different stakeholders have legitimate but incompatible fairness goals. A hiring system can’t simultaneously ensure equal selection rates (demographic parity) and equal accuracy (equalized odds) when qualification rates differ. The choice of fairness metric is a values decision, not a technical one.
visibility
Explainability & Transparency
XAI — understanding why AI makes decisions
The Black Box Problem
Deep neural networks with billions of parameters are opaque — even their creators can’t fully explain individual decisions. When an AI denies a loan, rejects a job application, or recommends a prison sentence, people deserve to know why. The EU AI Act and GDPR enshrine a “right to explanation.”
Explainability Methods: LIME (Local Interpretable Model Explanations) Perturb input, observe output changes Fit simple model locally to explain one prediction SHAP (SHapley Additive exPlanations) Game-theoretic feature attribution Each feature gets a contribution score Mathematically grounded (Shapley values) Attention Visualization Show which tokens/pixels the model attends to Caution: attention ≠ explanation Concept-Based Explanations TCAV: test with human-understandable concepts “This image was classified as doctor because of the stethoscope concept”
Opaque Models
Deep neural networks, large ensembles. High accuracy but no inherent interpretability. Require post-hoc explanation methods.
Interpretable Models
Decision trees, linear models, rule lists. Inherently transparent. Sometimes competitive accuracy for structured data.
The accuracy-interpretability tradeoff is often overstated. Cynthia Rudin (2019) argues that for high-stakes decisions (criminal justice, healthcare), we should use inherently interpretable models rather than explaining black boxes. Post-hoc explanations can be misleading — they explain a simplified version of the model, not the model itself.
policy
AI Regulation & Governance
EU AI Act, NIST framework, and global approaches
EU AI Act (Aug 2024, fully effective 2027): Risk-based classification: UNACCEPTABLE RISK (banned): Social scoring by governments Real-time facial recognition (most cases) Manipulative subliminal techniques Exploitation of vulnerable groups HIGH RISK (strict requirements): Hiring & employment decisions Credit scoring & insurance Criminal justice & law enforcement Education & access to services Critical infrastructure LIMITED RISK (transparency): Chatbots must disclose they are AI Deepfakes must be labeled Emotion recognition must be disclosed MINIMAL RISK (no requirements): Spam filters, video games, etc.
NIST AI Risk Management Framework: Govern → Map → Measure → Manage Voluntary, US-based guidance GPAI (General-Purpose AI) Rules: Providers must publish training data summary Implement copyright compliance policies Systemic risk models (>10²³ FLOPs): Cybersecurity protections Incident reporting Adversarial testing Global Landscape: EU: Comprehensive regulation US: Executive orders + sector-specific UK: Pro-innovation, light-touch China: Algorithm regulation + content rules G7: Hiroshima AI Process (voluntary)
The regulatory gap: AI capabilities advance in months; regulations take years. The EU AI Act was proposed in April 2021 and won’t be fully enforced until August 2027. By then, AI systems will be far more capable than what the law was designed for. Adaptive regulation is an unsolved challenge.
shield
AI Safety & Alignment
Ensuring AI systems do what we intend
The Alignment Problem
AI alignment ensures systems pursue intended goals rather than unintended ones. A reward-hacking RL agent finds loopholes. An LLM optimized for helpfulness may be manipulated into harmful outputs. As AI systems become more capable, misalignment becomes more dangerous. This is the central challenge of AI safety.
Current Safety Techniques: RLHF / DPO: Align with human preferences Constitutional AI: Self-critique against rules Red teaming: Adversarial testing by humans Guardrails: Input/output filters Sandboxing: Limit agent capabilities Open Problems: Scalable oversight: How do humans supervise AI smarter than them? Deceptive alignment: Could AI appear aligned during training but pursue different goals when deployed? Goal stability: Do goals remain stable as capabilities grow? Emergent capabilities: New abilities appear unpredictably at scale
Deepfakes & Misinformation
Generative AI enables photorealistic fake images, videos, and audio. Deepfake detection is an arms race — detectors improve, but so do generators. The societal risk: erosion of trust in all media. If anything can be faked, nothing can be trusted. Content provenance (C2PA) and watermarking are emerging defenses.
The dual-use dilemma: The same AI that diagnoses cancer also generates deepfakes. The same LLM that tutors students also writes phishing emails. There is no technical solution to dual use — it requires a combination of technical safeguards, regulation, social norms, and institutional accountability.
groups
Societal Impact
Jobs, inequality, copyright, and environmental cost
Labor Market Impact: At risk: Data entry, translation, basic coding, customer service, content moderation, paralegal work, bookkeeping Augmented: Doctors, lawyers, engineers, designers, researchers — AI as copilot New roles: Prompt engineers, AI trainers, alignment researchers, AI auditors Copyright & IP: Trained on copyrighted data without consent NYT v. OpenAI (2023): landmark lawsuit Getty v. Stability AI: image copyright EU AI Act: must disclose training data No consensus on AI-generated content ownership Environmental Cost: GPT-4 training: ~50 GWh estimated One ChatGPT query: ~10x a Google search Data centers: 2–3% of global electricity Water usage for cooling: billions of liters
The Inequality Amplifier
AI concentrates power in companies with the most data and compute. Training frontier models costs $100M+, limiting development to a handful of corporations. Open-source models (Llama, Mistral) partially democratize access, but the gap between frontier and open models keeps growing.
The global divide: AI benefits flow disproportionately to wealthy nations. Training data is dominated by English. Annotation labor is outsourced to low-wage countries ($1–2/hour). AI-driven automation may eliminate jobs in developing economies before they industrialize. Equitable AI development is a global justice issue.
verified
Responsible AI in Practice
Principles, frameworks, and what you can do
Responsible AI Principles: 1. Fairness: Test for disparate impact across all subgroups 2. Transparency: Explain decisions to affected individuals 3. Privacy: Minimize data collection, differential privacy 4. Safety: Red team, monitor, have kill switches 5. Accountability: Human oversight for high-stakes decisions 6. Inclusivity: Diverse teams, diverse data, diverse evaluation Practical Checklist: □ Audit training data for representation □ Disaggregate metrics by subgroup □ Use fairness toolkits (Fairlearn, AIF360) □ Document model cards & datasheets □ Establish human-in-the-loop for high stakes □ Monitor for drift and emerging bias □ Create incident response plans
The Path Forward
Technology alone won’t solve ethics. It requires interdisciplinary teams (engineers + ethicists + domain experts + affected communities), institutional accountability, regulatory frameworks, and a culture that values fairness alongside accuracy. Every AI practitioner has a responsibility to consider who benefits and who is harmed by the systems they build.
Coming up: Ch 14 covers the AI Landscape Today — agentic AI, multimodal models, reasoning systems, and where the field is heading in 2025 and beyond.