Ch 15 — Ethics, Deepfakes & Safety

Responsible AI, synthetic media risks, detection, regulation, and building trust
High Level
warning
Risks
arrow_forward
face
Deepfakes
arrow_forward
search
Detect
arrow_forward
gavel
Regulate
arrow_forward
shield
Protect
arrow_forward
handshake
Trust
-
Click play or press Space to begin...
Step- / 8
warning
The Risk Landscape
Why multimodal AI creates unique ethical challenges
Unique Multimodal Risks
Multimodal AI creates risks that text-only AI doesn’t have:

Deepfakes: Realistic fake images, video, and audio of real people
Non-consensual imagery: Generating intimate images of real people
Misinformation at scale: Fake photos/videos of events that never happened
Identity theft: Cloning someone’s voice and face for fraud
Surveillance: AI-powered facial recognition and tracking
Bias amplification: Visual stereotypes embedded in training data
Scale of the Problem
96% of deepfakes are non-consensual intimate imagery (2023 study)
500% increase in deepfake fraud attempts since 2023
$25B estimated losses from AI-generated fraud by 2027
Election interference: AI-generated images of candidates in fabricated scenarios
Voice cloning scams: 3 seconds of audio is enough to clone a voice convincingly
Key insight: The fundamental challenge: the same technology that enables creative expression, accessibility, and productivity also enables deception, harassment, and fraud. There is no technical solution that preserves all benefits while eliminating all harms.
face
Deepfakes: How They Work
The technology behind synthetic media
Types of Deepfakes
Face swap: Replace one person’s face with another in video. Uses encoder-decoder networks or diffusion models.
Face reenactment: Animate a face with someone else’s expressions and movements. Lip-sync to different audio.
Full body: Generate entire video of a person doing things they never did.
Voice cloning: Generate speech in anyone’s voice from a few seconds of sample audio.
Text-to-video: Generate entirely synthetic video from a text description.
Accessibility Curve
// Deepfake creation difficulty over time 2017 PhD-level skills, days of compute 2019 Technical skills, hours of compute 2021 App-level, minutes on consumer GPU 2023 One-click apps, real-time on phone 2025 Indistinguishable from real, instant // Quality has increased exponentially // while barrier to entry has collapsed // This is the core policy challenge
Key insight: Deepfake technology follows the same democratization curve as all AI: what required a PhD in 2017 requires a smartphone app in 2025. Detection technology must keep pace, but it’s fundamentally an arms race where generation has the advantage.
search
Detection & Provenance
How to identify AI-generated content
Detection Approaches
Artifact detection: Look for visual artifacts (inconsistent lighting, blurry edges, warped backgrounds). Becoming less reliable as generation improves.
Frequency analysis: AI-generated images have different frequency patterns than real photos. Detectable but bypassable.
Neural network classifiers: Train a classifier to distinguish real vs. AI-generated. Best accuracy (~95%) but arms race with generators.
Metadata analysis: Check EXIF data, compression artifacts, editing history. Easily stripped.
Content Provenance (C2PA)
C2PA (Coalition for Content Provenance and Authenticity) is the industry standard for content provenance:

Cryptographic signatures: Camera/software signs content at creation time
Edit history: Every modification is recorded in a tamper-evident manifest
AI disclosure: AI-generated content is labeled at creation
Adoption: Adobe, Microsoft, Google, Sony, Leica, Nikon — growing ecosystem
Limitation: Only works if the entire chain uses C2PA. Doesn’t help with content created outside the system.
Key insight: Detection is a losing battle long-term — generation will always be ahead. The more promising approach is provenance: proving content IS real (via C2PA) rather than trying to prove content is fake. “Authenticated real” beats “detected fake.”
gavel
Regulation & Policy
How governments and platforms are responding
Regulatory Landscape
EU AI Act (2024): Requires labeling of AI-generated content, bans certain uses (social scoring, real-time biometric surveillance), risk-based classification
US Executive Order (2023): Watermarking requirements for federal AI content, safety testing standards
China (2023): Requires consent for deepfakes, mandatory labeling, real-name registration for AI services
State laws (US): 40+ states have deepfake laws, mostly focused on elections and non-consensual imagery
Platform Policies
OpenAI: DALL-E blocks real people’s faces, adds C2PA metadata, content policy enforcement
Google: SynthID watermarking on all Gemini-generated images, invisible but detectable
Meta: Labels AI-generated content on Facebook/Instagram, blocks political deepfakes
Stability AI: Open-source models with fewer restrictions — controversial but enables research
Key insight: Regulation is converging on three principles: (1) mandatory labeling of AI-generated content, (2) consent requirements for using real people’s likenesses, and (3) liability for harmful uses. The challenge is enforcement, especially with open-source models.
diversity_3
Bias & Fairness
Visual stereotypes and representation in AI
Visual Bias in AI
Generation bias: “Generate a CEO” disproportionately produces white male images. “Generate a nurse” disproportionately produces female images.
Recognition bias: Face recognition systems have higher error rates for darker skin tones and women
Cultural bias: “Beautiful landscape” defaults to Western scenery. “Traditional food” defaults to European cuisine.
Representation: Training data over-represents certain demographics, geographies, and cultures
Mitigation Strategies
Diverse training data: Actively curate datasets for demographic and cultural balance
Bias auditing: Systematically test model outputs across demographic groups
Prompt engineering: Add diversity instructions to system prompts
Post-generation filtering: Detect and flag stereotypical outputs
Community feedback: Involve diverse communities in model evaluation
Transparency: Publish model cards documenting known biases and limitations
Key insight: Visual bias is harder to detect than text bias because it’s implicit — you have to look at thousands of generated images to notice patterns. Automated bias auditing (generate 1000 images of “doctor,” count demographics) is essential.
copyright
Copyright & Consent
Who owns AI-generated content? Who consented to training?
Training Data Rights
The core question: Is training on copyrighted images “fair use” or infringement?
Lawsuits: Getty Images v. Stability AI, artists v. Midjourney, NYT v. OpenAI
Opt-out mechanisms: robots.txt, Spawning.ai, DeviantArt opt-out — but enforcement is weak
Licensed data: Adobe Firefly trained only on licensed/public domain images — a differentiator
Emerging consensus: Training on public data is likely legal; generating copies of specific works is not
Output Ownership
US Copyright Office: AI-generated images cannot be copyrighted (no human authorship). But human-directed AI art with substantial creative input may qualify.
Commercial use: Most AI image services grant commercial rights to users
Style mimicry: Generating images “in the style of [living artist]” is legally gray and ethically questionable
Right of publicity: Using someone’s likeness without consent violates personality rights in most jurisdictions
Key insight: The legal landscape is still forming. The safest approach for commercial use: use models trained on licensed data (Adobe Firefly), avoid generating real people’s likenesses, and document your creative process to support copyright claims.
shield
Building Responsible Systems
Practical guidelines for ethical multimodal AI
Safety Checklist
// Responsible multimodal AI checklist ✓ Content filtering Block NSFW, violence, real people's faces ✓ Watermarking C2PA metadata + invisible watermarks ✓ Disclosure Label AI-generated content clearly ✓ Consent Never generate real people without consent ✓ Bias auditing Test outputs across demographics ✓ Rate limiting Prevent mass generation of harmful content ✓ Logging Audit trail for abuse investigation ✓ Reporting User-facing mechanism to report misuse
Organizational Practices
Ethics review: Review multimodal AI features before launch
Red teaming: Adversarial testing for harmful outputs and jailbreaks
Incident response: Plan for when your system is misused
Transparency reports: Publish data on content moderation and safety
Stakeholder engagement: Involve affected communities in design decisions
Continuous monitoring: Safety isn’t a one-time check — monitor ongoing use
Key insight: Responsible AI isn’t just about technology — it’s about organizational practices. The best safety systems fail without a culture that prioritizes responsible use, clear escalation paths, and accountability for harm.
school
Key Takeaways
Ethics and safety in the age of multimodal AI
Essential Concepts
1. Multimodal AI creates unique risks: Deepfakes, non-consensual imagery, voice cloning fraud

2. Provenance > detection: Proving content IS real (C2PA) is more sustainable than detecting fakes

3. Regulation is converging: Mandatory labeling, consent requirements, liability for harm

4. Visual bias is implicit: Requires systematic auditing across demographics

5. Copyright is unsettled: Use licensed training data for commercial safety
For Practitioners
Implement C2PA watermarking on all AI-generated content
Block real people’s faces in generation unless explicitly authorized
Audit for bias before every major release
Build incident response plans for misuse
Stay current on regulations — the legal landscape is changing fast
Next up: Chapter 16 covers evaluation for multimodal AI — how to measure quality, safety, and reliability of systems that process and generate images, video, and audio.