Ch 15: Ethics, Deepfakes & Safety

Ch 15 — Ethics, Deepfakes & Safety

Responsible AI, synthetic media risks, detection, regulation, and building trust

Index

High Level

warning

Risks

arrow_forward

face

Deepfakes

arrow_forward

Detect

arrow_forward

gavel

Regulate

arrow_forward

shield

Protect

arrow_forward

handshake

Trust

Click play or press Space to begin...

Step- / 8

warning

The Risk Landscape

Why multimodal AI creates unique ethical challenges

Unique Multimodal Risks

Multimodal AI creates risks that text-only AI doesn’t have:

• Deepfakes: Realistic fake images, video, and audio of real people
• Non-consensual imagery: Generating intimate images of real people
• Misinformation at scale: Fake photos/videos of events that never happened
• Identity theft: Cloning someone’s voice and face for fraud
• Surveillance: AI-powered facial recognition and tracking
• Bias amplification: Visual stereotypes embedded in training data

Scale of the Problem

• 96% of deepfakes are non-consensual intimate imagery (2023 study)
• 500% increase in deepfake fraud attempts since 2023
• $25B estimated losses from AI-generated fraud by 2027
• Election interference: AI-generated images of candidates in fabricated scenarios
• Voice cloning scams: 3 seconds of audio is enough to clone a voice convincingly

Key insight: The fundamental challenge: the same technology that enables creative expression, accessibility, and productivity also enables deception, harassment, and fraud. There is no technical solution that preserves all benefits while eliminating all harms.

face

Deepfakes: How They Work

The technology behind synthetic media

Types of Deepfakes

• Face swap: Replace one person’s face with another in video. Uses encoder-decoder networks or diffusion models.
• Face reenactment: Animate a face with someone else’s expressions and movements. Lip-sync to different audio.
• Full body: Generate entire video of a person doing things they never did.
• Voice cloning: Generate speech in anyone’s voice from a few seconds of sample audio.
• Text-to-video: Generate entirely synthetic video from a text description.

Accessibility Curve

// Deepfake creation difficulty over time 2017 PhD-level skills, days of compute 2019 Technical skills, hours of compute 2021 App-level, minutes on consumer GPU 2023 One-click apps, real-time on phone 2025 Indistinguishable from real, instant // Quality has increased exponentially // while barrier to entry has collapsed // This is the core policy challenge

Key insight: Deepfake technology follows the same democratization curve as all AI: what required a PhD in 2017 requires a smartphone app in 2025. Detection technology must keep pace, but it’s fundamentally an arms race where generation has the advantage.

Detection & Provenance

How to identify AI-generated content

Detection Approaches

• Artifact detection: Look for visual artifacts (inconsistent lighting, blurry edges, warped backgrounds). Becoming less reliable as generation improves.
• Frequency analysis: AI-generated images have different frequency patterns than real photos. Detectable but bypassable.
• Neural network classifiers: Train a classifier to distinguish real vs. AI-generated. Best accuracy (~95%) but arms race with generators.
• Metadata analysis: Check EXIF data, compression artifacts, editing history. Easily stripped.

Content Provenance (C2PA)

C2PA (Coalition for Content Provenance and Authenticity) is the industry standard for content provenance:

• Cryptographic signatures: Camera/software signs content at creation time
• Edit history: Every modification is recorded in a tamper-evident manifest
• AI disclosure: AI-generated content is labeled at creation
• Adoption: Adobe, Microsoft, Google, Sony, Leica, Nikon — growing ecosystem
• Limitation: Only works if the entire chain uses C2PA. Doesn’t help with content created outside the system.

Key insight: Detection is a losing battle long-term — generation will always be ahead. The more promising approach is provenance: proving content IS real (via C2PA) rather than trying to prove content is fake. “Authenticated real” beats “detected fake.”

gavel

Regulation & Policy

How governments and platforms are responding

Regulatory Landscape

• EU AI Act (2024): Requires labeling of AI-generated content, bans certain uses (social scoring, real-time biometric surveillance), risk-based classification
• US Executive Order (2023): Watermarking requirements for federal AI content, safety testing standards
• China (2023): Requires consent for deepfakes, mandatory labeling, real-name registration for AI services
• State laws (US): 40+ states have deepfake laws, mostly focused on elections and non-consensual imagery

Platform Policies

• OpenAI: DALL-E blocks real people’s faces, adds C2PA metadata, content policy enforcement
• Google: SynthID watermarking on all Gemini-generated images, invisible but detectable
• Meta: Labels AI-generated content on Facebook/Instagram, blocks political deepfakes
• Stability AI: Open-source models with fewer restrictions — controversial but enables research

Key insight: Regulation is converging on three principles: (1) mandatory labeling of AI-generated content, (2) consent requirements for using real people’s likenesses, and (3) liability for harmful uses. The challenge is enforcement, especially with open-source models.

diversity_3

Bias & Fairness

Visual stereotypes and representation in AI

Visual Bias in AI

• Generation bias: “Generate a CEO” disproportionately produces white male images. “Generate a nurse” disproportionately produces female images.
• Recognition bias: Face recognition systems have higher error rates for darker skin tones and women
• Cultural bias: “Beautiful landscape” defaults to Western scenery. “Traditional food” defaults to European cuisine.
• Representation: Training data over-represents certain demographics, geographies, and cultures

Mitigation Strategies

• Diverse training data: Actively curate datasets for demographic and cultural balance
• Bias auditing: Systematically test model outputs across demographic groups
• Prompt engineering: Add diversity instructions to system prompts
• Post-generation filtering: Detect and flag stereotypical outputs
• Community feedback: Involve diverse communities in model evaluation
• Transparency: Publish model cards documenting known biases and limitations

Key insight: Visual bias is harder to detect than text bias because it’s implicit — you have to look at thousands of generated images to notice patterns. Automated bias auditing (generate 1000 images of “doctor,” count demographics) is essential.

Who owns AI-generated content? Who consented to training?

Training Data Rights

• The core question: Is training on copyrighted images “fair use” or infringement?
• Lawsuits: Getty Images v. Stability AI, artists v. Midjourney, NYT v. OpenAI
• Opt-out mechanisms: robots.txt, Spawning.ai, DeviantArt opt-out — but enforcement is weak
• Licensed data: Adobe Firefly trained only on licensed/public domain images — a differentiator
• Emerging consensus: Training on public data is likely legal; generating copies of specific works is not

Output Ownership

• US Copyright Office: AI-generated images cannot be copyrighted (no human authorship). But human-directed AI art with substantial creative input may qualify.
• Commercial use: Most AI image services grant commercial rights to users
• Style mimicry: Generating images “in the style of [living artist]” is legally gray and ethically questionable
• Right of publicity: Using someone’s likeness without consent violates personality rights in most jurisdictions

Key insight: The legal landscape is still forming. The safest approach for commercial use: use models trained on licensed data (Adobe Firefly), avoid generating real people’s likenesses, and document your creative process to support copyright claims.

shield

Building Responsible Systems

Practical guidelines for ethical multimodal AI

Safety Checklist

// Responsible multimodal AI checklist ✓ Content filtering Block NSFW, violence, real people's faces ✓ Watermarking C2PA metadata + invisible watermarks ✓ Disclosure Label AI-generated content clearly ✓ Consent Never generate real people without consent ✓ Bias auditing Test outputs across demographics ✓ Rate limiting Prevent mass generation of harmful content ✓ Logging Audit trail for abuse investigation ✓ Reporting User-facing mechanism to report misuse

Organizational Practices

• Ethics review: Review multimodal AI features before launch
• Red teaming: Adversarial testing for harmful outputs and jailbreaks
• Incident response: Plan for when your system is misused
• Transparency reports: Publish data on content moderation and safety
• Stakeholder engagement: Involve affected communities in design decisions
• Continuous monitoring: Safety isn’t a one-time check — monitor ongoing use

Key insight: Responsible AI isn’t just about technology — it’s about organizational practices. The best safety systems fail without a culture that prioritizes responsible use, clear escalation paths, and accountability for harm.

school

Key Takeaways

Ethics and safety in the age of multimodal AI

Essential Concepts

1. Multimodal AI creates unique risks: Deepfakes, non-consensual imagery, voice cloning fraud

2. Provenance > detection: Proving content IS real (C2PA) is more sustainable than detecting fakes

3. Regulation is converging: Mandatory labeling, consent requirements, liability for harm

4. Visual bias is implicit: Requires systematic auditing across demographics

5. Copyright is unsettled: Use licensed training data for commercial safety

For Practitioners

• Implement C2PA watermarking on all AI-generated content
• Block real people’s faces in generation unless explicitly authorized
• Audit for bias before every major release
• Build incident response plans for misuse
• Stay current on regulations — the legal landscape is changing fast

Next up: Chapter 16 covers evaluation for multimodal AI — how to measure quality, safety, and reliability of systems that process and generate images, video, and audio.

arrow_back Ch 14: Multimodal Agents Ch 16: Evaluation for Multimodal arrow_forward