Ch 8 — Risks, Economics & What Comes Next

The honest assessment — security, costs, legal questions, and the future
High Level
shield
Security
arrow_forward
payments
Costs
arrow_forward
lock
Lock-In
arrow_forward
gavel
Legal
arrow_forward
trending_up
ROI
arrow_forward
rocket_launch
Future
-
Click play or press Space to begin...
Step- / 8
shield
The Expanded Security Surface
Agents introduce new attack vectors that didn’t exist before
New Attack Vectors
Prompt injection via code: A malicious contributor embeds instructions in code comments or PR descriptions that manipulate the AI reviewer into approving dangerous changes. Dependency confusion: An AI fixer suggests importing a malicious package with a name similar to a legitimate one. Data exfiltration: An agent with access to your codebase sends code to an external API for processing — your proprietary code is now in a third-party’s training data.
Mitigation Strategy
Treat agents like any other CI/CD component: least privilege (agents only access what they need), audit logs (every agent action is logged), network isolation (agents can’t reach arbitrary endpoints), and output validation (agent-generated code goes through the same security scanning as human-written code). The human review gate is your last line of defense.
Key insight: The security risk of autonomous agents is not theoretical. Prompt injection attacks against AI code reviewers have been demonstrated in research. Treat agent security as seriously as you treat production security.
payments
The Cost Reality
Autonomous pipelines are not free — and costs can surprise you
Where the Money Goes
API tokens: Each agent task consumes thousands of tokens for context, reasoning, and code generation. A fleet of 10 agents running 50 tasks/day can easily cost $2,000–$5,000/month in API fees. Compute: Cloud sandboxes, VMs, and container orchestration add infrastructure costs. Review time: Human review of agent output is a real cost — often underestimated. Doom loops: Agents that retry failed tasks burn tokens without producing value.
The ROI Equation
The math: agent cost per task vs. developer cost per task. If an agent completes a task for $15 in API costs + 30 minutes of review ($25 at $50/hr), the total is $40. If a developer would spend 4 hours on the same task ($200), the savings are $160 per task. At 50 tasks/month, that’s $8,000 in savings. But this only works if the agent’s success rate is high enough that you’re not spending review time on bad output.
Key insight: The biggest hidden cost is failed tasks. An agent that fails 40% of the time costs you API tokens for the failed attempts AND developer time to do the work manually. Success rate is the most important variable in the cost equation.
visibility_off
Shadow Agents & Governance
The agents your security team doesn’t know about
The Shadow Agent Problem
Individual developers start using background agents with personal API keys, outside of any organizational governance. They’re sending proprietary code to third-party APIs, generating PRs without security scanning, and bypassing review processes. This is the shadow IT problem applied to AI agents — and it’s already happening in most organizations.
The Governance Response
Don’t ban — channel. If you prohibit agent usage, developers will use them anyway with personal accounts. Instead: provide approved agent tools with organizational accounts, establish clear policies on what can be sent to external APIs, require all agent-generated code to go through standard review, and monitor for unauthorized agent usage in your git logs.
Key insight: The best governance strategy is making the approved path easier than the shadow path. If your official agent setup is harder to use than a personal Codex account, developers will choose the personal account every time.
lock
Vendor Lock-In at the Workflow Level
A new kind of lock-in that’s harder to escape
The New Lock-In
Traditional vendor lock-in is about data formats and APIs. Agent lock-in is about workflows, blueprints, and institutional knowledge. If your entire migration pipeline is built around Codex’s specific capabilities, switching to Devin requires rewriting every blueprint. If your team has spent 6 months learning to write effective Devin task descriptions, that knowledge doesn’t transfer to Claude Code. The lock-in is in the workflow, not the tool.
Mitigation
Abstract the agent layer. Build your orchestration system with a generic agent interface that can be backed by different providers. Your blueprints should define what needs to happen, not how a specific agent does it. This adds complexity upfront but gives you the ability to switch providers or use multiple providers for different task types.
Key insight: The agent market is moving fast. The best tool today may not be the best tool in 6 months. Building provider-agnostic orchestration is an investment in future flexibility.
gavel
Legal & IP Questions
Who owns agent-generated code? Who’s liable when it breaks?
Ownership
The legal landscape for AI-generated code is still evolving. Most jurisdictions currently treat AI-generated code as a tool output owned by the person who directed the tool — similar to code generated by a compiler or code generator. But this is not settled law. Your organization should have a clear policy: who owns the IP in agent-generated code? What are the licensing implications of code that may have been influenced by training data?
Liability
If an agent introduces a security vulnerability that leads to a data breach, who is liable? The developer who approved the PR? The team lead who set up the agent pipeline? The agent vendor? Current practice: the organization that deployed the code is liable, regardless of how it was generated. This is why the human review gate isn’t just a quality measure — it’s a liability measure.
Key insight: Treat agent-generated code with the same legal rigor as human-written code. It goes through the same review, the same security scanning, the same compliance checks. The generation method doesn’t change the liability.
trending_up
Measuring ROI Honestly
Beyond the hype — what autonomous pipelines actually deliver
What to Measure
Developer time saved: hours of manual work replaced by agent output. Throughput increase: features shipped per sprint before vs. after. Quality impact: production incident rate, bug escape rate. Backlog velocity: how fast is the team clearing the backlog of deferred work (migrations, tech debt, test coverage)? Developer satisfaction: are developers happier? Retention improving?
What Not to Measure
Don’t measure lines of code generated. An agent that generates 10,000 lines of boilerplate is less valuable than one that generates 100 lines of correct business logic. Don’t measure PRs merged per day without quality gates. Don’t compare agent speed to human speed without accounting for review time — the total cycle time (agent work + review) is what matters.
Key insight: The most honest ROI metric: “What work got done this quarter that wouldn’t have gotten done without agents?” Migrations completed, tech debt paid down, test coverage improved — the work that was always important but never prioritized.
rocket_launch
The Convergence
Where autonomous software pipelines are heading
Unified Platforms
Today, you assemble an autonomous pipeline from separate tools: Codex for background agents, a separate tool for CI/CD review, another for test generation, and custom orchestration to tie them together. The trend is toward unified platforms that handle the entire pipeline — from task intake to PR delivery to review assistance. GitHub, Anthropic, and others are building toward this vision.
Longer Horizons
Current agents work best on tasks that take 30 minutes to 2 hours. The frontier is pushing toward multi-day tasks — agents that can work on a feature over several days, maintaining context, handling interruptions, and coordinating with other agents and humans. This requires better memory, better planning, and better self-correction — all active areas of research.
Key insight: The convergence toward unified platforms means the orchestration complexity you build today may be handled by platforms tomorrow. Build for today’s needs, but don’t over-invest in custom infrastructure that platforms will commoditize.
flag
Your Action Plan
What to do Monday morning
This Week
Assess your team’s position on the autonomy spectrum (Ch 1). Pick one background agent tool and submit one low-risk task (Ch 2). Add an AI reviewer to one repository in comment-only mode (Ch 3). These three actions take less than a day and give you real experience with the technology.
This Month
Run your first agent-driven migration on a small, well-tested module (Ch 5). Set up basic monitoring for agent success rate and cost per task (Ch 6). Have a team discussion about how specs and reviews need to evolve (Ch 7).
This Quarter
Scale to a small fleet of 2–5 agents with proper isolation and task queuing (Ch 6). Establish governance policies for agent usage (Ch 8). Measure ROI honestly and decide whether to expand. The goal isn’t to automate everything — it’s to find the workflows where agents deliver clear, measurable value for your specific team and codebase.
Key insight: The teams that succeed with autonomous pipelines are the ones that start small, measure honestly, and scale based on evidence. The teams that fail are the ones that try to transform everything at once. Start with one agent, one task type, one repository — and grow from there.