Ch 7: LLM-Specific Ethics — AI Ethics & Responsible AI

Ch 7 — LLM-Specific Ethics

Hallucination, misinformation, deepfakes, copyright, and the unique ethical challenges of generative AI

Index

High Level

smart_toy

LLM

arrow_forward

error

Hallucinate

arrow_forward

campaign

Misinfo

arrow_forward

face_retouching_off

Deepfake

arrow_forward

shield

Mitigate

Click play or press Space to begin...

Step- / 8

error

Hallucination: The Core Problem

When LLMs confidently generate false information

What Is Hallucination

Hallucination occurs when an LLM generates text that is fluent, coherent, and confident — but factually wrong. This isn’t a bug that can be fixed; it’s a fundamental property of how language models work. LLMs are trained to predict the next token, not to verify truth. They optimize for plausibility, not accuracy. OpenAI’s research (2025) shows hallucinations stem partly from training procedures that reward guessing over admitting uncertainty — like multiple-choice tests that penalize blank answers. Types of hallucination: Intrinsic — contradicts the source material (e.g., summarizing a document incorrectly). Extrinsic — generates information not present in any source (e.g., fabricating citations). Factual — states incorrect facts confidently (e.g., wrong dates, fake legal cases).

Hallucination Examples

// Real-world hallucination incidents Legal: Lawyer used ChatGPT for brief Model fabricated 6 court cases Complete with fake citations // Mata v. Avianca (2023) Medical: "What medications interact with X?" Model confidently lists interactions that don't exist // Life-threatening misinformation Academic: Students cite papers that don't exist Complete with fake DOIs, authors, and plausible abstracts Why It Happens: LLMs predict next token Optimize for plausibility, not truth Training rewards guessing over "I don't know" // Fundamental architectural issue Mitigation: RAG (Retrieval-Augmented Generation) Grounding in verified sources Confidence calibration Human-in-the-loop verification

Key insight: Hallucination is not a bug to be fixed — it’s a fundamental property of how LLMs work. They predict plausible text, not true text. The ethical obligation is to clearly communicate this limitation to users and implement guardrails (RAG, verification) for high-stakes applications.

campaign

Misinformation at Scale

LLMs as engines of industrialized deception

The Threat

LLMs have created what researchers call “industrialized deception” — the automated production of misleading content at unprecedented scale. Before LLMs, creating convincing misinformation required human effort. Now, a single person can generate thousands of unique, persuasive articles, social media posts, and fake reviews in minutes. Scale — one person can produce content that previously required hundreds of writers. Quality — LLM-generated text is often indistinguishable from human writing. Personalization — content can be tailored to specific demographics, languages, and belief systems. Cost — generating misinformation costs almost nothing. The dual-use problem: the same LLMs that generate misinformation can also be used to detect it, creating an arms race between generation and detection.

Misinformation Vectors

// LLM misinformation at scale Fake News: Generate thousands of articles Tailored to specific audiences Multiple languages simultaneously // Cost: ~$0.01 per article Fake Reviews: Product reviews, restaurant reviews Unique writing styles per review Bypass existing detection systems Social Media Manipulation: Bot networks with unique personas Engage in realistic conversations Amplify specific narratives Spear Phishing: Personalized phishing emails Scraped from social media profiles Much more convincing than templates Academic Fraud: Fake papers with plausible results Fake peer reviews // Already detected in journals Detection Arms Race: AI-generated text detectors Watermarking (C2PA, SynthID) But: paraphrasing defeats detectors

Key insight: LLMs have reduced the cost of misinformation to near zero. Detection is important but insufficient — the real defense is media literacy, provenance tracking (C2PA watermarking), and platform-level policies that require disclosure of AI-generated content.

face_retouching_off

Deepfakes & Synthetic Media

AI-generated images, audio, and video

The Deepfake Problem

Deepfakes are AI-generated or AI-manipulated media (images, audio, video) that depict events that never happened. The technology has advanced rapidly: Face swapping — replace one person’s face with another in video. Voice cloning — clone someone’s voice from a few seconds of audio. Image generation — create photorealistic images of people who don’t exist or events that never happened. Video generation — tools like Sora can generate realistic video from text prompts. Harms: non-consensual intimate imagery (the most common use), political manipulation, financial fraud (CEO voice cloning for wire transfers), evidence fabrication, and erosion of trust in authentic media (“the liar’s dividend” — real evidence dismissed as fake).

Deepfake Landscape

// Deepfake harms and defenses Harms: Non-consensual imagery: ~96% of deepfakes (Sensity AI report) Political manipulation: fake speeches Financial fraud: CEO voice cloning Evidence fabrication: fake alibis Trust erosion: "liar's dividend" The Liar's Dividend: Real evidence → "That's a deepfake!" Authentic video dismissed as AI Undermines journalism, courts, truth Defenses: C2PA: Content Authenticity Initiative // Cryptographic provenance for media SynthID: Google's watermarking Detection models (but arms race) Legislation: deepfake laws in 40+ US states (as of 2025) Responsible Generation: Consent requirements Watermarking all generated content Refusing harmful requests Content provenance metadata

Key insight: The most insidious effect of deepfakes isn’t fake content — it’s the “liar’s dividend”: once deepfakes exist, any authentic evidence can be dismissed as AI-generated. The solution requires both technical measures (C2PA provenance) and legal frameworks (deepfake legislation).

Who owns AI-generated content? Who owns the training data?

The Copyright Debate

LLMs raise two fundamental copyright questions: Training data — is it legal to train on copyrighted material? LLMs are trained on books, articles, code, and art scraped from the internet, much of it copyrighted. The New York Times v. OpenAI lawsuit (filed Dec 2023) argues this is copyright infringement. OpenAI argues it’s fair use (transformative use). Output ownership — who owns AI-generated content? The US Copyright Office has ruled that purely AI-generated works cannot be copyrighted (Tháler v. Perlmutter, 2023). Works require “human authorship.” But what about AI-assisted works where a human provides significant creative direction? The legal landscape is evolving rapidly. The EU AI Act requires disclosure of copyrighted training data. Japan has taken a more permissive stance, allowing training on copyrighted data.

// AI copyright issues Training Data: NYT v. OpenAI (2023): infringement? OpenAI: "fair use" (transformative) NYT: "verbatim reproduction" // Outcome will set precedent Output Ownership: US: AI-only works not copyrightable Tháler v. Perlmutter (2023) Requires "human authorship" AI-assisted: depends on human input Global Approaches: US: Fair use defense (uncertain) EU: Must disclose training data Japan: Permissive (training allowed) UK: Proposed text/data mining exception Code Copyright: GitHub Copilot trained on open source Can reproduce GPL-licensed code Without attribution or license // Class action filed (2022) Best Practices: Track training data provenance Implement opt-out mechanisms Disclose AI involvement in outputs Use licensed/permissive datasets

Key insight: The copyright question around AI training data is the most consequential legal battle in AI. The outcome of NYT v. OpenAI will set precedent for the entire industry. Responsible practice: track your training data provenance, implement opt-out mechanisms, and disclose AI involvement.

work

Labor & Economic Impact

Displacement, deskilling, and the future of work

Impact on Workers

LLMs are disrupting knowledge work in ways that raise ethical concerns: Displacement — content writers, translators, customer service agents, and junior programmers face direct competition from AI. Unlike previous automation waves that affected manual labor, LLMs target cognitive tasks. Deskilling — when AI handles routine tasks, workers may lose the skills they need to handle complex cases. Junior developers who rely on Copilot may never learn to debug or architect systems. Ghost work — the AI industry relies on millions of low-paid workers for data labeling, content moderation, and RLHF. Time magazine reported Kenyan workers were paid $1.32–$2/hour to label toxic content for ChatGPT. Power concentration — AI development is concentrated in a few companies with the capital for massive compute. This concentrates economic power.

Economic Disruption

// Labor impact of LLMs Displacement: Content writing: -30% freelance jobs Translation: machine + post-edit Customer service: chatbot-first Junior coding: Copilot competition // Knowledge work, not just manual Deskilling: Junior devs → can't debug without AI Writers → lose craft of writing Analysts → can't reason without AI // "Use it or lose it" for skills Ghost Work: Data labeling: $1-2/hour (Kenya) Content moderation: PTSD risk RLHF: rating harmful outputs // The hidden human cost of AI Power Concentration: Training GPT-4: ~$100M+ Only a few companies can afford it Winner-take-all dynamics // Concentration of economic power Ethical Response: Transparent impact assessments Reskilling programs Fair compensation for data workers Inclusive AI development

Key insight: The hidden human cost of AI is the millions of low-paid workers who label data, moderate content, and provide RLHF feedback. Ethical AI development requires fair compensation and working conditions for these workers, not just technical safety measures.

eco

Environmental Impact

The carbon footprint of training and running LLMs

The Environmental Cost

Training and running LLMs has a significant environmental footprint: Training energy — training GPT-3 consumed approximately 1,287 MWh of electricity and emitted ~552 tonnes of CO2 (equivalent to 123 gasoline cars driven for a year). Larger models like GPT-4 are estimated to be 10–100x more. Inference energy — a single ChatGPT query uses roughly 10x the energy of a Google search. With hundreds of millions of daily queries, inference costs dominate over time. Water usage — data centers use enormous amounts of water for cooling. Microsoft reported a 34% increase in water consumption (2023), largely attributed to AI workloads. Hardware lifecycle — GPU manufacturing requires rare earth minerals, and hardware is replaced every 2–3 years. The ethical question: is the benefit of AI worth its environmental cost?

Environmental Footprint

// Environmental cost of LLMs Training: GPT-3: ~1,287 MWh, ~552t CO2 // = 123 cars for one year GPT-4: estimated 10-100x more Llama 3 405B: ~30,000 GPU-hours Inference: ChatGPT query: ~10x Google search Hundreds of millions of queries/day Inference > training over time Water: Microsoft: +34% water use (2023) Google: +20% water use (2023) Data center cooling is thirsty Mitigation: Smaller models (distillation) Efficient architectures (MoE) Renewable energy data centers Model sharing (open source) Inference optimization (quantization) Carbon-aware scheduling Reporting: Disclose training compute Report carbon footprint // Few companies do this today

Key insight: Inference energy dominates over time because training happens once but inference runs continuously. The most impactful environmental intervention is making models smaller and more efficient (distillation, quantization, MoE), not just using renewable energy for training.

shield

Safety & Alignment

RLHF, Constitutional AI, red teaming, and guardrails

Alignment Techniques

Making LLMs safe and aligned with human values is an active area of research: RLHF (Reinforcement Learning from Human Feedback) — train a reward model from human preferences, then fine-tune the LLM to maximize that reward. Used by OpenAI (ChatGPT), Anthropic (Claude), and others. Constitutional AI (Anthropic) — instead of human feedback for every response, define a set of principles (“constitution”) and have the AI critique and revise its own outputs. More scalable than pure RLHF. Red teaming — adversarial testing where humans (or other AI) try to make the model produce harmful outputs. Used to find jailbreaks, biases, and safety failures before deployment. Guardrails — input/output filters that block harmful content. Tools like NeMo Guardrails, Guardrails AI, and LlamaGuard provide programmable safety layers.

Safety Stack

// LLM safety techniques RLHF: 1. Collect human preferences 2. Train reward model 3. Fine-tune LLM with PPO/DPO ✓ Effective at reducing harm ✗ Expensive, hard to scale Constitutional AI: 1. Define principles (constitution) 2. AI critiques own outputs 3. AI revises based on principles ✓ More scalable than RLHF ✓ Principles are transparent Red Teaming: Manual: human adversaries Automated: AI attacks AI Find: jailbreaks, biases, leaks // Do BEFORE deployment Guardrails: Input filters: block harmful prompts Output filters: block harmful responses Topic rails: stay on-topic Fact-checking: verify claims PII detection: redact personal data Tools: NeMo Guardrails (NVIDIA) LlamaGuard (Meta) Guardrails AI (open source)

Key insight: No single technique is sufficient. Effective LLM safety requires defense in depth: RLHF/Constitutional AI for base alignment, red teaming for vulnerability discovery, and guardrails for runtime protection. Safety is a continuous process, not a one-time fix.

checklist

Responsible LLM Deployment

A practical framework for ethical generative AI

Deployment Framework

Deploying LLMs responsibly requires addressing all the challenges in this chapter: Transparency — clearly disclose that users are interacting with AI. Label AI-generated content. Publish model cards. Accuracy — implement RAG for factual grounding. Add disclaimers for high-stakes domains (medical, legal, financial). Never present AI output as authoritative. Safety — implement input/output guardrails. Red team before deployment. Monitor for misuse. Privacy — PII detection on inputs/outputs. Clear data retention policies. User consent for data use. Fairness — test for bias across demographics. Implement debiasing techniques. Monitor for disparate impact. Accountability — maintain audit trails. Have a human escalation path. Incident response plan for failures.

Deployment Checklist

// Responsible LLM deployment Before Launch: □ Model card published □ Red teaming completed □ Bias testing across demographics □ Guardrails implemented □ RAG for factual grounding □ PII detection active □ Data retention policy defined □ Human escalation path exists At Launch: □ AI disclosure to users □ Disclaimers for high-stakes use □ Feedback mechanism available □ Monitoring dashboards live Ongoing: □ Monitor for misuse patterns □ Track bias metrics over time □ Update guardrails for new attacks □ Regular red teaming cycles □ User feedback → model updates □ Incident response drills Never: ✗ Present AI as human ✗ Deploy without guardrails ✗ Ignore user feedback ✗ Skip bias testing

Key insight: Responsible LLM deployment is not a checklist you complete once — it’s a continuous process. The threat landscape evolves (new jailbreaks, new misuse patterns), and your safety measures must evolve with it. Build monitoring and iteration into your deployment process.

arrow_back Ch 6: Privacy & Data Rights Ch 8: AI Governance & Regulation arrow_forward