Ch 3: AI Product Roles & Team Structure

Ch 3 — AI Product Roles & Team Structure

Who you need on an AI product team, how it differs from traditional software, and the PM’s evolving role.

Index

High Level

person

AI PM

arrow_forward

engineering

Core Team

arrow_forward

hub

GenAI Roles

arrow_forward

account_tree

Topology

arrow_forward

handshake

Working

arrow_forward

group_add

Scaling

Click play or press Space to begin...

Step- / 8

person

The AI Product Manager

How the PM role changes when the product is probabilistic

What Stays the Same

The core PM responsibilities remain: understand users, define problems, prioritize ruthlessly, ship value. You still own the product vision, write specs, manage stakeholders, and make trade-off decisions. Customer empathy, business acumen, and communication skills matter just as much.

The fundamentals of product management don’t change. What changes is the toolkit you apply to those fundamentals.

What Changes

You must become semi-technical in AI. Not to build models, but to:

• Evaluate feasibility — Can this problem be solved with current AI? What data do we need?
• Read benchmarks — Is 92% F1 score good enough for our use case?
• Challenge engineering — “Why is accuracy plateauing? Have we tried more training data vs. a different architecture?”
• Set thresholds — Define the “good enough” bar for precision, recall, and latency
• Design for failure — Specify fallback behavior, confidence thresholds, and escalation paths
• Manage uncertainty — Communicate non-linear timelines and probabilistic outcomes to stakeholders

The key shift: Traditional PMs define what the product should do and engineers make it happen deterministically. AI PMs define what the product should do and what acceptable performance looks like, because the engineering team can’t guarantee exact behavior. You’re negotiating with probability, not just with stakeholders.

engineering

The Core AI Team

Six essential roles and what each one actually does

ML Engineer

Builds, trains, and deploys models. Manages the pipeline from data preprocessing to production inference. This is your primary technical counterpart — the person who translates your product requirements into model architecture decisions.

What to ask them: “What’s blocking accuracy improvement?” “How long until we can A/B test this model?” “What’s the latency at the 95th percentile?”

Data Engineer

Builds data pipelines, ensures data quality, manages feature stores. Often the critical bottleneck — 68% of companies report that data infrastructure is their biggest AI challenge, not model quality.

What to ask them: “Do we have enough labeled data for this use case?” “How fresh is the training data?” “What’s our data quality score?”

Data Scientist / Researcher

Explores new approaches, runs experiments, validates hypotheses. More research-oriented than the ML engineer — they figure out if something is possible before the ML engineer figures out how to productionize it.

What to ask them: “What’s the theoretical ceiling for this task?” “Have you seen similar problems solved elsewhere?”

Software Engineer

Builds the APIs, UI, integrations, and infrastructure that wraps the AI. The model is useless without a product around it. Software engineers build that product — the serving layer, the user interface, the monitoring dashboards, the feedback collection mechanisms.

What to ask them: “Can we serve this model within our latency budget?” “How do we collect user feedback on predictions?”

UX Designer

Designs user experiences around probabilistic outputs. This is fundamentally harder than traditional UX because the system’s behavior is non-deterministic. How do you show confidence? How do you handle errors gracefully? How do you set expectations?

What to ask them: “How does the user know when the AI is confident vs. guessing?” “What’s the recovery path when the AI is wrong?”

Domain Expert

Provides subject matter expertise for data labeling, evaluation criteria, and edge case identification. In healthcare AI, this is a physician. In legal AI, a lawyer. In financial AI, a compliance officer.

What to ask them: “Is this output clinically/legally/financially acceptable?” “What edge cases would a practitioner immediately flag?”

hub

Emerging GenAI Roles

New roles that didn’t exist two years ago — and why they matter

Prompt Engineer / Applied AI Engineer

For LLM-based products, prompt engineering is product logic. The prompt engineer designs, tests, and iterates on the instructions that control model behavior. This role sits at the intersection of product design and engineering.

In many teams, the PM and prompt engineer work as closely as the PM and designer do in traditional software. The prompt is the product specification — it defines what the model does, how it responds, what it refuses, and how it handles edge cases.

Key skill: Systematic evaluation. A good prompt engineer doesn’t just write prompts — they build evaluation frameworks to measure prompt quality across hundreds of test cases.

MLOps Engineer

Manages the operational lifecycle of models in production: deployment pipelines, model versioning, A/B testing infrastructure, monitoring, and automated retraining. Think of them as DevOps for machine learning.

Without MLOps, models that work in notebooks never make it to production — or they make it to production and silently degrade.

Evaluation Specialist

An increasingly critical role focused on measuring AI quality. They design evaluation datasets, build automated testing pipelines, define quality rubrics, and run red-team exercises. For LLM products, evaluation is the new QA — but far more complex because you can’t write deterministic test assertions.

Some teams call this role “AI Quality Engineer” or “Eval Lead.”

AI Architect

Designs the end-to-end system architecture: which models to use, how to chain them, where to add RAG, how to handle caching, when to use fine-tuning vs. prompting. Critical for complex systems with multiple AI components.

Hiring reality: Most teams don’t have all these roles as separate hires. In early-stage teams, the ML engineer often covers MLOps, the PM does prompt engineering, and the data scientist handles evaluation. The roles are distinct functions, not necessarily distinct people. Know which functions you need even if one person covers multiple.

account_tree

Team Topologies

Three ways to organize AI teams — and when each works best

1. Embedded Model

AI specialists sit within product teams. Each product squad has its own ML engineer, data scientist, or applied AI engineer alongside the PM, designer, and software engineers.

Best for: Companies with 1–3 AI features that need fast iteration and tight product context. The AI person understands the user problem deeply because they’re embedded in the product team.

Risk: Duplication of effort. Three product teams might independently build similar data pipelines or evaluation frameworks. No shared learning across teams.

2. Central Platform Team

A dedicated AI team builds shared infrastructure, evaluation tooling, model serving, and governance. Product teams submit requests and the central team delivers AI capabilities.

Best for: Regulated industries (finance, healthcare) where consistency, compliance, and centralized governance matter more than speed. Also good when AI infrastructure is complex and expensive to duplicate.

Risk: Becomes a bottleneck. Product teams wait in a queue. The central team lacks product context and builds technically impressive solutions that miss user needs.

3. Hybrid Model (Most Common)

A small platform team provides shared foundations (model serving, evaluation tools, data pipelines, governance frameworks) while product teams embed AI engineers for delivery.

The platform team handles the “boring but critical” infrastructure. The embedded engineers handle the product-specific AI work. The platform team sets standards; the product teams move fast within those standards.

Best for: Most organizations. Balances speed with consistency. Prevents duplication without creating bottlenecks.

PM implication: Your team topology determines your velocity and your constraints. In an embedded model, you have direct access to AI talent but limited infrastructure. In a central model, you have great infrastructure but compete for AI team bandwidth. In a hybrid, you navigate both. Understand your topology and optimize your process accordingly.

handshake

The PM–ML Relationship

How to work effectively with data scientists and ML engineers

The Translation Problem

The biggest dysfunction in AI teams is the translation gap between product and ML. The PM says “make the recommendations better.” The ML engineer asks “better by what metric?” The PM says “users should like them more.” The ML engineer needs a number.

This gap exists because traditional PMs think in user stories and the ML team thinks in loss functions. Neither is wrong — they’re speaking different languages about the same problem.

Your job as PM: Bridge this gap. Learn enough ML vocabulary to have productive conversations. Translate user needs into measurable objectives. Translate model metrics back into business impact.

Five Rules for Working with ML Teams

1. Define success metrics before building. “Precision above 90% with recall above 80%” is actionable. “Make it accurate” is not.

2. Provide evaluation data, not just requirements. Give the ML team 200 examples of “good” and “bad” outputs. This is more useful than a 10-page spec.

3. Expect non-linear progress. The team might go from 70% to 85% in a week, then spend a month going from 85% to 88%. This is normal, not a failure.

4. Ask “what would help most?” Often the answer is “more labeled data” or “cleaner data,” not “more engineering time.” PMs can unblock ML teams by prioritizing data work.

5. Review model outputs together. Weekly “error review” sessions where PM and ML look at the worst predictions together are the highest-ROI meeting in AI product development.

The error review ritual: Every week, pull the 20 worst predictions from the past 7 days. PM and ML engineer review them together. The PM provides product context (“this type of error is catastrophic for the user”). The ML engineer provides technical context (“this fails because the training data doesn’t cover this pattern”). Together, you prioritize what to fix. This single practice improves AI product quality faster than any other.

report

Common Team Mistakes

68% of companies get AI team structure wrong. Here’s how.

Mistake 1: Hiring ML Without Data Infrastructure

The most common mistake: hiring expensive ML engineers before having data pipelines, labeled datasets, or feature stores. The ML engineers spend 80% of their time on data plumbing instead of modeling. They get frustrated and leave.

Fix: Hire a data engineer first. Or at minimum, ensure your data infrastructure can support ML work before hiring ML talent.

Mistake 2: No PM for AI

Letting the ML team self-direct without a PM. Data scientists are researchers by training — they optimize for model performance, not user value. Without a PM, you get technically impressive models that solve the wrong problem or can’t be integrated into a product.

Fix: Every AI initiative needs a PM who owns the “why” and “for whom,” even if the team is small.

Mistake 3: Treating Evaluation as an Afterthought

Building the model first, then figuring out how to evaluate it. This leads to models that “seem to work” but have no rigorous quality measurement. You can’t improve what you can’t measure.

Fix: Build the evaluation framework before or alongside the model. Define what “good” looks like with concrete examples before writing a line of model code.

Mistake 4: No Clear Ownership of Cost & Incidents

In traditional software, engineering owns uptime and the PM owns the roadmap. In AI products, who owns model cost? Who responds when the model starts producing harmful outputs at 2 AM? Who decides when to retrain vs. rollback?

Without explicit ownership of these AI-specific responsibilities, they fall through the cracks.

The ownership matrix: For every AI product, explicitly assign: Who owns model quality? (Usually PM + ML lead jointly.) Who owns model cost? (PM or engineering lead.) Who owns incident response? (MLOps or on-call ML engineer.) Who owns evaluation datasets? (PM provides examples, ML builds the framework.) Ambiguity here causes the most painful failures.

group_add

Scaling the AI Team

How team structure evolves as your AI product matures

Stage 1: Exploration (3–5 People)

Team: 1 PM, 1 ML Engineer, 1 Data Engineer, 1 Software Engineer, and access to a domain expert.

Focus: Prove feasibility. Can AI solve this problem with available data? Build a prototype, test with real users, measure baseline performance.

PM role: Hands-on. You’re writing prompts, reviewing outputs, labeling data, and talking to users daily. There’s no buffer between you and the model.

Stage 2: Product-Market Fit (6–10 People)

Added roles: UX Designer, additional ML Engineer, Prompt Engineer (for LLM products), part-time Evaluation Specialist.

Focus: Ship to real users. Improve quality systematically. Build monitoring. Establish feedback loops.

PM role: Shifting from hands-on to orchestration. You’re defining evaluation criteria, prioritizing model improvements vs. product features, and managing stakeholder expectations about AI capabilities.

Stage 3: Scale (10–20+ People)

Added roles: MLOps Engineer, AI Architect, dedicated Evaluation team, additional Data Scientists for experimentation.

Focus: Reliability, cost optimization, multiple model versions, A/B testing at scale, governance and compliance.

PM role: Strategic. You’re managing a portfolio of AI capabilities, negotiating compute budgets, defining the model improvement roadmap, and ensuring the team doesn’t over-optimize for model metrics at the expense of user value.

Scaling principle: Resist the urge to hire ahead of need. Start with the smallest team that can prove feasibility. Add roles as specific bottlenecks emerge. The most common regret is hiring 10 ML engineers before having a clear product direction — not hiring too slowly. Let the product’s needs pull the team size, not the other way around.

psychology

The AI PM’s Playbook

Six practices that define effective AI product managers

Practices 1–3

1. Learn to read model metrics. You don’t need to build models, but you must understand precision, recall, F1, AUC, and latency well enough to make product decisions. If the ML team says “we improved recall from 78% to 85%,” you should immediately know what that means for users.

2. Own the evaluation dataset. The PM should curate the “golden set” of examples that define product quality. This is the most important artifact in AI product development — more important than the PRD.

3. Run weekly error reviews. Review the worst model outputs with the ML team every week. This builds shared understanding, surfaces the most impactful improvements, and keeps the team focused on user-facing quality.

Practices 4–6

4. Communicate uncertainty to stakeholders. “The model is at 87% accuracy and we expect to reach 92% in 4–6 weeks, but there’s a chance we plateau at 90%.” Stakeholders respect honesty about uncertainty far more than false precision about timelines.

5. Prioritize data work. When the ML team says “more labeled data would help more than a new architecture,” take that seriously. Allocating resources to data labeling, cleaning, or collection is often the highest-leverage PM decision.

6. Define the human fallback. For every AI feature, specify exactly what happens when the model fails or isn’t confident. This is a product decision that the PM must own — not something to figure out after launch.

The bottom line: AI product management is a team sport with new positions on the field. Your job isn’t to be the best at any single role — it’s to be the connective tissue that translates between user needs, business goals, and technical reality. The teams that win are the ones where the PM understands enough AI to ask the right questions and the ML team understands enough product to build the right things.

arrow_back Ch 2: The AI Product Landscape Ch 4: The AI Product Lifecycle arrow_forward