Ch 5: Problem Framing for AI

Ch 5 — Problem Framing for AI

The #1 PM skill for AI products. When to use AI, when not to, and how to frame problems correctly.

Index

High Level

help

Should You?

arrow_forward

rule

Rules First

arrow_forward

frame_inspect

Frame

arrow_forward

target

Scope

arrow_forward

warning

Anti-Patterns

arrow_forward

checklist

Checklist

Click play or press Space to begin...

Step- / 8

help

Should You Even Use AI?

The most important question most teams skip

The AI Hammer Problem

When you have a shiny new hammer, everything looks like a nail. The biggest source of AI product failure isn’t bad models — it’s applying AI to problems that don’t need it.

AI adds complexity, cost, unpredictability, and maintenance burden. Every time you introduce AI, you’re trading deterministic behavior for probabilistic behavior. That trade-off is only worth it when the problem genuinely requires pattern recognition, language understanding, or handling of ambiguity that rules can’t address.

The best AI PMs are the ones who say “we don’t need AI for this” as often as they say “AI can solve this.”

The Decision Hierarchy

Before reaching for AI, work through this hierarchy:

1. Can a simple rule solve it?
If the logic can be expressed as if/then/else statements, use rules. Tax calculations, approval workflows, account balance displays. Rules cost near-zero per decision, are fully auditable, and never drift.

2. Can traditional automation solve it?
If the task is repetitive and structured, use workflow automation (Zapier, scripts, RPA). Moving data between systems, sending scheduled emails, generating reports from templates.

3. Can traditional ML solve it?
If you need pattern recognition on structured data, use classical ML. Churn prediction, demand forecasting, anomaly detection. Cheaper and more predictable than LLMs.

4. Do you need generative AI / LLMs?
Only when the task involves unstructured data, language understanding, content generation, or reasoning across ambiguous inputs.

Cost reality check: A rule engine processes thousands of decisions per second at near-zero cost. Traditional ML costs ~$0.001 per prediction. An LLM call costs $0.01–$0.50+. Applying an LLM to a problem a rule engine can solve costs 10–100x more per decision, introduces latency, and is harder to audit. Always start at the top of the hierarchy.

rule

Rules, ML, or LLM?

A practical decision matrix for choosing the right approach

Use Rules When

• Inputs are structured and well-defined
• Logic is deterministic — same input must always produce same output
• Full auditability and compliance are required
• The decision space is finite and enumerable
• Speed and cost matter (near-zero latency and cost)

Examples: Pricing calculations, eligibility checks, routing rules, data validation, access control, regulatory compliance checks.

Limitations: Can’t handle ambiguity, novel inputs, or patterns too complex to express as rules.

Use Traditional ML When

• You have structured data with clear labels
• The task is classification, regression, or ranking
• You need pattern recognition across many variables
• Predictions need to be fast and cheap at scale

Examples: Fraud detection, churn prediction, demand forecasting, recommendation ranking, credit scoring, spam filtering.

Limitations: Requires labeled training data. Can’t handle unstructured text/images well (use deep learning). Doesn’t generalize to new task types.

Use LLMs / Generative AI When

• Inputs are unstructured (natural language, documents, images)
• The task requires understanding context, nuance, or intent
• Multiple reasonable outputs exist for the same input
• The task would require a human’s judgment and language ability
• You need flexibility across many task types without retraining

Examples: Content generation, summarization, conversational interfaces, document analysis, code generation, translation, complex reasoning tasks.

Limitations: Expensive per query. Non-deterministic. Can hallucinate. Harder to evaluate. Latency is higher.

The hybrid reality: Most production AI systems use all three. A customer service system might use rules to route tickets by category, ML to predict urgency, and an LLM to draft responses. The PM’s job is to assign the right tool to each sub-task, not to pick one approach for everything.

frame_inspect

The Problem Framing Canvas

Six questions that transform a vague idea into a well-framed AI problem

Question 1: What Is the User Problem?

Start with the user, not the technology. “We want to use AI” is not a problem statement. “Our support team takes 4 hours to resolve a ticket that could be resolved in 10 minutes with the right information” is.

The problem must be specific, measurable, and tied to user pain. If you can’t articulate the problem without mentioning AI, you don’t have a problem — you have a technology looking for a purpose.

Question 2: What Does “Good” Look Like?

Define success in concrete, measurable terms before building anything:

• “The AI correctly classifies 90% of support tickets into the right category”
• “Generated summaries are rated as ‘accurate’ by domain experts 85% of the time”
• “The recommendation engine increases click-through rate by 15%”

If you can’t define “good,” you can’t evaluate the model, and you can’t ship with confidence.

Question 3: What’s the Current Baseline?

How is this problem solved today? How well?

• Human baseline: Humans do this task at 80% accuracy in 10 minutes. AI at 82% in 2 seconds is valuable.
• Rule-based baseline: Current rules handle 60% of cases. AI that handles 85% is a clear improvement.
• No baseline: Nobody does this today. AI creates entirely new capability.

Without a baseline, you can’t measure improvement. Without measuring improvement, you can’t justify the investment.

Questions 4–6

Q4: What data exists? — Is there training data? How much? How clean? Where does it live? Who owns it? (Covered in depth in Chapter 6.)

Q5: What are the error costs? — What happens when the AI is wrong? Is a false positive or false negative more expensive? This determines your optimization target.

Q6: What’s the human fallback? — When the AI fails or isn’t confident, what happens? Escalate to a human? Show a default? Do nothing? The fallback design is as important as the AI itself.

The canvas in practice: Fill in these six questions on a single page. If you can’t answer any of them clearly, that’s a red flag. The most common gap: teams can articulate the user problem (Q1) but can’t define “good” (Q2) or identify available data (Q4). Those gaps must be resolved before engineering begins.

target

Scoping: Narrow the Problem

The art of making AI problems small enough to solve well

Why Narrow Scope Wins

The #1 scoping mistake in AI products: trying to solve too broad a problem.

Too broad: “Build an AI that handles all customer inquiries.”
Well-scoped: “Build an AI that answers the top 20 most common billing questions, which represent 40% of ticket volume.”

Too broad: “Build an AI that writes marketing content.”
Well-scoped: “Build an AI that generates first drafts of product description pages for our e-commerce catalog.”

Narrow scope means you can collect focused training data, define clear evaluation criteria, and ship a product that works well for a specific use case rather than poorly for everything.

The Scoping Framework

1. Start with the highest-volume, lowest-risk use case.
Find the task that happens most often and where errors are least costly. This gives you the most data for improvement and the safest environment to learn.

2. Constrain the input space.
Limit the types of inputs the AI handles. “English-language billing questions from US customers” is more tractable than “any question in any language from any customer.”

3. Define the boundary explicitly.
What’s in scope and what’s out? When should the AI say “I can’t help with this” and escalate? The boundary is as important as the capability.

4. Plan the expansion path.
Scope narrowly for v1, but design the architecture to expand. Start with billing questions, then add shipping questions, then returns. Each expansion is a new evaluation cycle.

The 80/20 rule for AI: In most products, 20% of use cases represent 80% of volume. Solve those first. An AI that handles 80% of cases brilliantly and escalates the rest to humans is far more valuable than an AI that handles 100% of cases mediocrely. Scope for the 80%.

psychology_alt

Reframing: The PM Superpower

How the way you frame the problem determines whether AI can solve it

The Same Problem, Different Frames

The way you frame an AI problem dramatically affects feasibility. Consider “reduce customer churn”:

Frame 1: Predict who will churn.
Classification problem. Input: customer behavior data. Output: churn probability. Well-studied, lots of training data available, traditional ML works well.

Frame 2: Explain why customers churn.
Much harder. Requires causal analysis, not just correlation. ML can identify patterns but explaining causation is a different problem entirely.

Frame 3: Recommend actions to prevent churn.
Hardest. Requires not just prediction but prescriptive recommendations. Needs A/B testing to validate that recommendations actually work.

Same business goal, three very different AI problems with different feasibility, data requirements, and timelines.

Reframing Techniques

Classification vs. Generation: “Write the perfect response” (generation, hard) vs. “Select the best response from these 5 templates” (classification, easier). Can you reduce a generation problem to a selection problem?

Prediction vs. Detection: “Predict which transactions will be fraudulent” (hard, requires future knowledge) vs. “Detect transactions that look anomalous compared to the user’s history” (easier, uses historical patterns).

Full automation vs. Triage: “Automatically resolve all support tickets” (very hard) vs. “Categorize and prioritize tickets so agents handle the most urgent first” (much more tractable).

Open-ended vs. Constrained: “Generate any marketing copy” (open-ended, hard to evaluate) vs. “Generate subject lines for email campaigns given a product and audience” (constrained, evaluable).

PM insight: The best AI PMs don’t just accept the problem as stated. They reframe it into the version that is most feasible with current AI capabilities, most evaluable, and most valuable to users. This reframing skill — finding the tractable version of a hard problem — is the single most valuable skill in AI product management.

block

When NOT to Use AI

Seven signals that AI is the wrong solution

Red Flags 1–4

1. The problem has a deterministic solution.
If you can write the logic as rules and it covers all cases, use rules. AI adds cost and unpredictability for zero benefit. Tax calculations, unit conversions, and data validation don’t need AI.

2. You don’t have data — and can’t get it.
AI without data is like a factory without raw materials. If the data doesn’t exist, can’t be collected, or is locked behind legal/privacy barriers, AI isn’t viable. (Exception: LLMs with zero-shot capabilities, but even these need evaluation data.)

3. Errors are catastrophic and irreversible.
If a wrong answer causes death, imprisonment, or financial ruin — and there’s no human review step — AI is too risky. AI can assist high-stakes decisions but should not make them autonomously.

4. The problem changes faster than the model can adapt.
If the rules change daily (regulatory environments, rapidly evolving fraud patterns), a model trained on last month’s data may be wrong today. Rules that can be updated instantly may be more appropriate.

Red Flags 5–7

5. Users need 100% consistency.
If the same input must always produce the exact same output (legal documents, financial calculations, compliance reports), probabilistic AI is the wrong tool. Users will not accept “it usually gets it right.”

6. The ROI doesn’t justify the cost.
AI has ongoing costs: compute, monitoring, retraining, specialized talent. If the problem affects 10 users per month and costs $500 in manual labor, spending $200K on an AI solution is irrational. Do the math before building.

7. You’re solving a process problem, not a technology problem.
Sometimes the real issue is a broken workflow, unclear ownership, or missing training — not a lack of AI. Adding AI to a broken process automates the brokenness. Fix the process first.

The courage to say no: Saying “this doesn’t need AI” when leadership is excited about AI is one of the hardest things a PM can do. It’s also one of the most valuable. Every AI project that shouldn’t have been started consumes resources that could have gone to a project that should have. The PM who kills bad AI ideas early is more valuable than the PM who ships mediocre AI products.

lightbulb

Real-World Framing Examples

How successful AI products framed their problems

GitHub Copilot

Naive framing: “Build an AI that writes code.”
Actual framing: “Predict the next few lines of code given the current file context, and present them as inline suggestions the developer can accept with Tab or reject by continuing to type.”

Why it works: Narrow scope (next lines, not whole programs). Low error cost (developer reviews every suggestion). Implicit feedback (accept/reject). Constrained output (code in the current language and context).

Spotify Discover Weekly

Naive framing: “Recommend music users will like.”
Actual framing: “Every Monday, generate a playlist of 30 songs the user hasn’t heard that match their listening patterns, weighted toward discovery over familiarity.”

Why it works: Constrained output (exactly 30 songs). Clear cadence (weekly). Measurable success (listen-through rate, saves). Explicit trade-off (discovery over safe picks).

Stripe Radar (Fraud Detection)

Naive framing: “Detect all fraud.”
Actual framing: “Score every transaction with a fraud probability. Block transactions above the high threshold automatically. Flag transactions in the medium range for manual review. Allow transactions below the low threshold.”

Why it works: Three-tier decision (not binary). Explicit thresholds. Human-in-the-loop for ambiguous cases. Merchants can adjust thresholds based on their risk tolerance.

Grammarly

Naive framing: “Fix all writing errors.”
Actual framing: “Detect grammar, spelling, and style issues in real-time. Present each as an inline suggestion with an explanation. User accepts or dismisses each suggestion individually.”

Why it works: Granular suggestions (not wholesale rewrites). User retains control. Each suggestion is independently evaluable. Implicit feedback (accept/dismiss) improves the model.

The pattern: Every successful AI product shares these traits: narrow scope, clear success metric, low or managed error cost, user control over AI outputs, and a feedback mechanism. None of them tried to solve the entire problem at once.

checklist

The Problem Framing Checklist

Run through this before greenlighting any AI initiative

Feasibility Checks

□ Is AI the right tool?
Have you ruled out rules and traditional automation? Can you justify the added complexity and cost?

□ Is the problem well-defined?
Can you describe the input, the desired output, and what “good” looks like in measurable terms?

□ Does data exist?
Is there training data (or can you create it)? Is there evaluation data? Is the data accessible and clean enough?

□ Is the scope narrow enough?
Are you solving one specific problem, not “everything”? Can you describe the boundary of what’s in and out of scope?

□ Is the error cost manageable?
What happens when the AI is wrong? Is there a human fallback? Is the error cost proportional to the value created?

Value Checks

□ Is there a measurable baseline?
How is this problem solved today? How well? Can you measure improvement over the baseline?

□ Is the ROI positive?
Does the value created (time saved, revenue generated, cost reduced) exceed the cost of building and maintaining the AI?

□ Is there a feedback mechanism?
How will the model learn from usage? Is there explicit or implicit feedback that can drive improvement?

□ Can you ship incrementally?
Can you launch a narrow v1, learn from real usage, and expand? Or does the product only work if everything works?

□ Do stakeholders understand the trade-offs?
Does leadership understand that AI is probabilistic, timelines are uncertain, and perfection is not achievable?

The bottom line: Problem framing is the highest-leverage activity in AI product management. A well-framed problem with mediocre AI outperforms a poorly-framed problem with state-of-the-art AI every time. Spend more time on framing than you think you should. The chapters ahead assume you’ve framed the problem correctly — everything that follows depends on it.

arrow_back Ch 4: The AI Product Lifecycle Ch 6: Data Discovery & Feasibility arrow_forward