A five-level framework for classifying AI coding tools by their degree of independence: Autocomplete (Level 1), Interactive Assistant (Level 2), Task Agent (Level 3), Background Agent (Level 4), and Autonomous Pipeline (Level 5).
An emerging role responsible for managing an agent fleet: writing task descriptions, monitoring performance, tuning blueprints, and handling escalations.
A deterministic code transformation tool that operates on the Abstract Syntax Tree rather than raw text. Used in hybrid migration approaches for reliable, mechanical transformations like renaming imports or updating method signatures.
A coding agent that runs independently of the developer’s IDE session, typically in a cloud sandbox or container. Accepts tasks asynchronously and delivers pull requests without requiring the developer’s attention during execution.
An orchestration approach pioneered by Stripe’s Minions system where workflows are defined in code, combining deterministic steps (linting, formatting, CI) with flexible agent loops (implementation, test writing). Blueprints reduce the agent’s decision space and make output predictable.
A safety mechanism where an agent is allowed a fixed number of attempts to complete a task. If it fails after N rounds, the task is escalated to a human rather than retrying indefinitely (which would cause a doom loop).
AI-powered analysis that identifies untested logic paths in code, going beyond line coverage to detect missing behavioral assertions. Most valuable at the PR level, catching gaps before code merges.
GitHub’s feature that automatically generates fix suggestions for security vulnerabilities detected by CodeQL. Covers 90%+ of alert types in JS/TS/Java/Python and remediates 2/3 of vulnerabilities with little editing.
A failure mode where an agent retries a failing task indefinitely, consuming tokens and compute without making progress. Prevented by bounded retry rounds and token budgets.
A test that passes sometimes and fails sometimes with no code change, typically caused by timing issues, shared state, or network dependencies. AI agents can detect, classify, fix, and quarantine flaky tests.
An orchestration mechanism that prevents two agents from modifying the same file simultaneously, avoiding merge conflicts and context contamination in parallel execution.
A Git feature that creates separate working directories sharing a single .git object store. The standard isolation primitive for running multiple agents in parallel — each agent gets its own worktree with independent HEAD, index, and files.
A mandatory step in every production autonomous pipeline where a human reviews agent-generated code before it reaches production. A design principle, not a temporary limitation.
A global emergency control that stops all running agents immediately. Essential for production agent fleets to handle model degradation, security incidents, or systemic failures.
Devin 2.2’s feature where a coordinator session breaks down large tasks and delegates to multiple parallel sessions, each in its own isolated VM. Built-in multi-agent orchestration.
A test quality technique that introduces small bugs (mutations) into code and checks if tests catch them. Mutation score is a better indicator of test quality than coverage percentage.
Reason → Act → Observe → Repeat. The fundamental execution pattern for task agents (Level 3+), where the agent plans an action, executes it, observes the result, and adjusts its approach.
The percentage of agent-generated PRs merged without significant rework. The single most important metric for agent fleet health. Target 70%+ before scaling up.
An AI coding agent used by individual developers outside organizational governance — with personal API keys, bypassing security scanning and review processes. The AI equivalent of shadow IT.
In Claude Code, a child agent spawned from a main session to handle a specific task. Can run in the background (Ctrl+B) and execute in parallel with other subagents when tasks are independent.
An orchestration component that sits between task sources and agents, managing priority, assignment, and file-scope conflict prevention for parallel agent execution.
An autonomous cycle where the agent runs tests, identifies failures, classifies whether the test or the code is wrong, fixes the appropriate side, and re-runs to verify.
The process of determining how much authority to give AI agents in a pipeline: comment-only → suggest → auto-apply with review → auto-merge. Most teams should stay at “suggest” for code changes.
Using Vision Language Models to compare screenshots semantically rather than pixel-by-pixel. Catches meaningful visual changes (missing buttons, layout breaks) while ignoring irrelevant pixel shifts.