How It Works
You submit a task — via a CLI command, a Slack message, or a ticket — and walk away. The agent runs in a cloud sandbox with its own isolated environment, clones your repo, makes changes, runs the test suite, and opens a pull request. You come back to a PR ready for review. Tools at this level: OpenAI Codex (cloud sandboxes with GitHub Action integration), Devin 2.2 (full desktop environment, self-reviewing PRs), and Claude Code subagents (parallel background execution via Ctrl+B).
The Key Difference
Decoupled from your session. The agent doesn’t need your IDE open, your terminal running, or your attention. It operates asynchronously. This is what enables parallelism — you can have 5 background agents working on 5 different tasks simultaneously. Stripe’s Minions system operates at this level, producing over 1,300 PRs per week with zero human-written code (all human-reviewed).
Key insight: Level 4 is the frontier for most teams in early 2026. The technology exists, but adoption requires new workflows — you need to learn how to write good task descriptions, set up review processes for agent-generated PRs, and build trust incrementally.