Ch 13: Ethics & Bias

Ch 13 — Ethics & Bias in AI

Fairness, accountability, transparency, and the societal impact of AI

Index Under the Hood →

High Level

warning

Bias

arrow_forward

gavel

Cases

arrow_forward

balance

Fairness

arrow_forward

visibility

XAI

arrow_forward

policy

Regulation

arrow_forward

shield

Safety

Click play or press Space to begin the journey...

Step- / 8

warning

Sources of AI Bias

Where bias enters the ML pipeline

What Is Algorithmic Bias?

Algorithmic bias occurs when an AI system produces systematically unfair outcomes for certain groups. It’s not a bug in the code — it’s a reflection of biased data, flawed assumptions, or misaligned objectives. ML models learn patterns from historical data, and if that data reflects societal inequalities, the model reproduces and amplifies them.

Sources of Bias in the ML Pipeline: 1. Historical bias Data reflects past discrimination (e.g., fewer women in tech hiring data) 2. Representation bias Training data doesn’t represent all groups (e.g., facial recognition trained mostly on light-skinned faces) 3. Measurement bias Proxy variables encode protected attributes (e.g., zip code correlates with race) 4. Aggregation bias One model for diverse populations (e.g., medical AI trained on one ethnicity) 5. Evaluation bias Benchmarks don’t test for fairness (e.g., accuracy is high overall but low for minority subgroups) 6. Deployment bias System used outside intended context

The Feedback Loop Problem

Biased AI decisions create new biased data. Predictive policing sends more officers to minority neighborhoods → more arrests there → data shows more crime there → model sends even more officers. The system amplifies the original bias rather than correcting it.

Key insight: Removing protected attributes (race, gender) from features doesn’t fix bias. Other features (zip code, name, browsing history) are correlated with protected attributes. This is called redundant encoding — the model finds proxies. Fairness requires active intervention, not just feature removal.

gavel

Real-World Bias Cases

When AI systems caused measurable harm

COMPAS (Criminal Justice, 2016): Predicted recidivism risk for sentencing ProPublica found: Black defendants falsely flagged as high-risk at 2x the rate of white defendants. White defendants falsely flagged as low-risk at higher rates. Used in courts across the US Amazon Hiring Tool (2014–2018): Trained on 10 years of resumes (mostly male) Penalized resumes with “women’s” (e.g., “women’s chess club captain”) Downgraded all-women’s college graduates Amazon scrapped the project entirely Healthcare Algorithm (2019): Used health spending as proxy for health need Black patients spend less due to systemic barriers → algorithm rated them as healthier than equally sick white patients Affected ~200 million patients (Obermeyer)

Facial Recognition (Buolamwini, 2018): “Gender Shades” study tested 3 commercial systems (Microsoft, IBM, Face++) Error rates: Light-skinned males: 0.8% Dark-skinned females: 34.7% 43x higher error for dark-skinned women Google Photos (2015): Auto-tagged Black users as “gorillas” Google’s fix: removed the “gorilla” label Still not properly fixed years later LLM Bias (Ongoing): GPT models associate certain professions with specific genders/races “The doctor... he” vs “The nurse... she” Stereotypes embedded in training data

Pattern: In every case, the system worked well for the majority group and failed for minorities. High overall accuracy masked severe disparities. This is why disaggregated evaluation (measuring performance per subgroup) is essential.

balance

Fairness Definitions & Tensions

What does “fair” mean? It depends on who you ask

Demographic Parity: P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1) Equal selection rates across groups “Hire 50% from each group” Equalized Odds: P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1) P(Ŷ=1 | Y=0, A=0) = P(Ŷ=1 | Y=0, A=1) Equal TPR and FPR across groups “Same accuracy for everyone” Calibration: P(Y=1 | Ŷ=p, A=0) = P(Y=1 | Ŷ=p, A=1) = p When model says 70% risk, it’s 70% for all “Scores mean the same thing” Individual Fairness: Similar individuals get similar predictions “Treat like cases alike”

The Impossibility Theorem

Chouldechova (2017) and Kleinberg et al. (2016) proved that except in trivial cases, you cannot satisfy demographic parity, equalized odds, and calibration simultaneously when base rates differ between groups. If Group A reoffends at 30% and Group B at 50%, a calibrated model must flag Group B more often — violating demographic parity.

This means fairness is a choice, not a formula. Different stakeholders have legitimate but incompatible fairness goals. A hiring system can’t simultaneously ensure equal selection rates (demographic parity) and equal accuracy (equalized odds) when qualification rates differ. The choice of fairness metric is a values decision, not a technical one.

visibility

Explainability & Transparency

XAI — understanding why AI makes decisions

The Black Box Problem

Deep neural networks with billions of parameters are opaque — even their creators can’t fully explain individual decisions. When an AI denies a loan, rejects a job application, or recommends a prison sentence, people deserve to know why. The EU AI Act and GDPR enshrine a “right to explanation.”

Explainability Methods: LIME (Local Interpretable Model Explanations) Perturb input, observe output changes Fit simple model locally to explain one prediction SHAP (SHapley Additive exPlanations) Game-theoretic feature attribution Each feature gets a contribution score Mathematically grounded (Shapley values) Attention Visualization Show which tokens/pixels the model attends to Caution: attention ≠ explanation Concept-Based Explanations TCAV: test with human-understandable concepts “This image was classified as doctor because of the stethoscope concept”

Opaque Models

Deep neural networks, large ensembles. High accuracy but no inherent interpretability. Require post-hoc explanation methods.

Interpretable Models

Decision trees, linear models, rule lists. Inherently transparent. Sometimes competitive accuracy for structured data.

The accuracy-interpretability tradeoff is often overstated. Cynthia Rudin (2019) argues that for high-stakes decisions (criminal justice, healthcare), we should use inherently interpretable models rather than explaining black boxes. Post-hoc explanations can be misleading — they explain a simplified version of the model, not the model itself.

policy

AI Regulation & Governance

EU AI Act, NIST framework, and global approaches

EU AI Act (Aug 2024, fully effective 2027): Risk-based classification: UNACCEPTABLE RISK (banned): Social scoring by governments Real-time facial recognition (most cases) Manipulative subliminal techniques Exploitation of vulnerable groups HIGH RISK (strict requirements): Hiring & employment decisions Credit scoring & insurance Criminal justice & law enforcement Education & access to services Critical infrastructure LIMITED RISK (transparency): Chatbots must disclose they are AI Deepfakes must be labeled Emotion recognition must be disclosed MINIMAL RISK (no requirements): Spam filters, video games, etc.

NIST AI Risk Management Framework: Govern → Map → Measure → Manage Voluntary, US-based guidance GPAI (General-Purpose AI) Rules: Providers must publish training data summary Implement copyright compliance policies Systemic risk models (>10²³ FLOPs): Cybersecurity protections Incident reporting Adversarial testing Global Landscape: EU: Comprehensive regulation US: Executive orders + sector-specific UK: Pro-innovation, light-touch China: Algorithm regulation + content rules G7: Hiroshima AI Process (voluntary)

The regulatory gap: AI capabilities advance in months; regulations take years. The EU AI Act was proposed in April 2021 and won’t be fully enforced until August 2027. By then, AI systems will be far more capable than what the law was designed for. Adaptive regulation is an unsolved challenge.

shield

AI Safety & Alignment

Ensuring AI systems do what we intend

The Alignment Problem

AI alignment ensures systems pursue intended goals rather than unintended ones. A reward-hacking RL agent finds loopholes. An LLM optimized for helpfulness may be manipulated into harmful outputs. As AI systems become more capable, misalignment becomes more dangerous. This is the central challenge of AI safety.

Current Safety Techniques: RLHF / DPO: Align with human preferences Constitutional AI: Self-critique against rules Red teaming: Adversarial testing by humans Guardrails: Input/output filters Sandboxing: Limit agent capabilities Open Problems: Scalable oversight: How do humans supervise AI smarter than them? Deceptive alignment: Could AI appear aligned during training but pursue different goals when deployed? Goal stability: Do goals remain stable as capabilities grow? Emergent capabilities: New abilities appear unpredictably at scale

Deepfakes & Misinformation

Generative AI enables photorealistic fake images, videos, and audio. Deepfake detection is an arms race — detectors improve, but so do generators. The societal risk: erosion of trust in all media. If anything can be faked, nothing can be trusted. Content provenance (C2PA) and watermarking are emerging defenses.

The dual-use dilemma: The same AI that diagnoses cancer also generates deepfakes. The same LLM that tutors students also writes phishing emails. There is no technical solution to dual use — it requires a combination of technical safeguards, regulation, social norms, and institutional accountability.

groups

Societal Impact

Jobs, inequality, copyright, and environmental cost

Labor Market Impact: At risk: Data entry, translation, basic coding, customer service, content moderation, paralegal work, bookkeeping Augmented: Doctors, lawyers, engineers, designers, researchers — AI as copilot New roles: Prompt engineers, AI trainers, alignment researchers, AI auditors Copyright & IP: Trained on copyrighted data without consent NYT v. OpenAI (2023): landmark lawsuit Getty v. Stability AI: image copyright EU AI Act: must disclose training data No consensus on AI-generated content ownership Environmental Cost: GPT-4 training: ~50 GWh estimated One ChatGPT query: ~10x a Google search Data centers: 2–3% of global electricity Water usage for cooling: billions of liters

The Inequality Amplifier

AI concentrates power in companies with the most data and compute. Training frontier models costs $100M+, limiting development to a handful of corporations. Open-source models (Llama, Mistral) partially democratize access, but the gap between frontier and open models keeps growing.

The global divide: AI benefits flow disproportionately to wealthy nations. Training data is dominated by English. Annotation labor is outsourced to low-wage countries ($1–2/hour). AI-driven automation may eliminate jobs in developing economies before they industrialize. Equitable AI development is a global justice issue.

verified

Responsible AI in Practice

Principles, frameworks, and what you can do

Responsible AI Principles: 1. Fairness: Test for disparate impact across all subgroups 2. Transparency: Explain decisions to affected individuals 3. Privacy: Minimize data collection, differential privacy 4. Safety: Red team, monitor, have kill switches 5. Accountability: Human oversight for high-stakes decisions 6. Inclusivity: Diverse teams, diverse data, diverse evaluation Practical Checklist: □ Audit training data for representation □ Disaggregate metrics by subgroup □ Use fairness toolkits (Fairlearn, AIF360) □ Document model cards & datasheets □ Establish human-in-the-loop for high stakes □ Monitor for drift and emerging bias □ Create incident response plans

The Path Forward

Technology alone won’t solve ethics. It requires interdisciplinary teams (engineers + ethicists + domain experts + affected communities), institutional accountability, regulatory frameworks, and a culture that values fairness alongside accuracy. Every AI practitioner has a responsibility to consider who benefits and who is harmed by the systems they build.

Coming up: Ch 14 covers the AI Landscape Today — agentic AI, multimodal models, reasoning systems, and where the field is heading in 2025 and beyond.