Ch 7: Build vs. Buy vs. API

Ch 7 — Build vs. Buy vs. API

Foundation model APIs, fine-tuning, custom models, or vertical SaaS? The decision matrix.

Index

High Level

view_list

Spectrum

arrow_forward

api

API

arrow_forward

tune

Fine-Tune

arrow_forward

build

Custom

arrow_forward

lock

Lock-In

arrow_forward

decision

Decide

Click play or press Space to begin...

Step- / 8

view_list

The Build-Buy Spectrum

It’s not binary — there are five distinct options with different trade-offs

Five Options, Not Two

The “build vs. buy” framing is misleading. In AI, there’s a spectrum of five options, each with different cost, speed, control, and differentiation profiles:

1. Buy SaaS — Purchase a complete AI product (e.g., Intercom Fin for support, Gong for sales intelligence). Fastest, least differentiated.

2. Use Foundation Model APIs — Call OpenAI, Anthropic, or Google APIs directly. Build the product layer yourself, rent the intelligence.

3. Fine-tune a Foundation Model — Take a pre-trained model and customize it on your domain data. More investment, more differentiation.

4. Deploy Open-Weight Models — Run Llama, Mistral, or similar on your own infrastructure. Full control, no per-query costs, but significant ops burden.

5. Train Custom Models — Build from scratch on your data. Maximum control and differentiation. Maximum cost and timeline.

The Trade-Off Matrix

Speed to market:
SaaS (days) > API (weeks) > Fine-tune (months) > Open-weight (months) > Custom (6–12+ months)

Year 1 cost:
SaaS ($6K–$600K) ≈ API ($10K–$200K) < Fine-tune ($50K–$300K) < Open-weight ($80K–$400K) < Custom ($200K–$600K+)

Differentiation:
Custom > Open-weight > Fine-tune > API > SaaS

Data control:
Custom = Open-weight > Fine-tune > API > SaaS

Maintenance burden:
Custom > Open-weight > Fine-tune > API > SaaS

The key insight: You’re not choosing between “renting intelligence” and “owning intelligence.” You’re choosing between renting model behavior (fast, cheap, dependent) and owning model outcomes (slow, expensive, independent). The right choice depends on whether AI is your core differentiator or a supporting capability.

api

Option 1: Foundation Model APIs

The fastest path — rent intelligence from OpenAI, Anthropic, Google, and others

How It Works

You call a foundation model API (GPT-4o, Claude, Gemini) with your prompt, context, and parameters. The model returns a response. You build the product layer — UI, business logic, data pipelines, evaluation — around the API.

What you own: The product experience, the prompt engineering, the evaluation framework, the user data, the business logic.

What you rent: The model intelligence. You have no control over model updates, pricing changes, or capability shifts. When OpenAI releases a new version, your product behavior may change overnight.

When to Choose APIs

• AI is a feature, not the core product (adding summarization to a project management tool)
• You need to ship fast and validate the concept before investing heavily
• The task is general-purpose (writing, summarization, Q&A) not domain-specific
• Your team lacks ML expertise and you need to start somewhere
• Volume is low to moderate (under 100K requests/day)

Risks & Costs

Per-query cost: $0.01–$0.50+ per request depending on model and token count. At 1M requests/day, that’s $10K–$500K/month. Costs scale linearly with usage.

Latency: API calls add 500ms–5s of latency. For real-time applications, this may be unacceptable.

Data privacy: Your data (prompts, context) is sent to a third-party server. For sensitive data (healthcare, finance, legal), this may violate compliance requirements. Some providers offer data processing agreements, but the data still leaves your infrastructure.

Model changes: When the provider updates the model, your product behavior changes without your control. A prompt that worked perfectly on GPT-4 may behave differently on GPT-4o. You must regression-test after every model update.

The API starting strategy: Start with APIs to validate the concept. If the product works and scales, migrate to fine-tuning or open-weight models for cost reduction and control. APIs are the best way to learn what you need before committing to heavier investments.

tune

Option 2: Fine-Tuning

Customizing a foundation model on your domain data

How It Works

Fine-tuning takes a pre-trained foundation model and continues training it on your domain-specific data. The model learns your terminology, style, patterns, and domain knowledge while retaining its general capabilities.

Example: Fine-tuning GPT-4o on 10,000 examples of your company’s customer support conversations. The resulting model understands your products, policies, and tone of voice far better than the base model with prompting alone.

What changes vs. APIs:
• Better performance on domain-specific tasks (often 10–30% improvement)
• Shorter prompts needed (the knowledge is in the weights, not the prompt)
• Lower per-query cost (smaller model, shorter prompts)
• More consistent behavior (less sensitive to prompt variations)

When to Choose Fine-Tuning

• Prompting alone doesn’t achieve your quality threshold
• You have domain-specific data (1,000+ examples) that the base model hasn’t seen
• You need consistent style or format that’s hard to enforce via prompting
• You want to reduce per-query cost (fine-tuned smaller models can match larger models on specific tasks)
• The task is narrow and well-defined (classification, extraction, specific generation patterns)

Costs & Considerations

Training cost: $50–$5,000+ per fine-tuning run depending on model size and data volume. You’ll run multiple iterations.

Data requirement: Minimum 500–1,000 high-quality examples. More is better. The examples must be representative of production use cases.

Ongoing cost: You must re-fine-tune when the base model updates, when your domain changes, or when you add new capabilities. This is recurring, not one-time.

Evaluation burden: You need robust evaluation to ensure fine-tuning improved the right things without degrading others.

Fine-tuning trap: Don’t fine-tune prematurely. Exhaust prompt engineering and RAG first. Fine-tuning is harder to iterate on (hours per experiment vs. minutes for prompt changes). Only fine-tune when you’ve proven the use case with APIs and need the next level of quality or cost efficiency.

build

Option 3: Open-Weight & Custom Models

Maximum control, maximum responsibility

Open-Weight Models

Models like Meta’s Llama, Mistral, and others release their weights publicly. You download the model and run it on your own infrastructure (or a cloud provider).

Advantages:
• No per-query API cost — You pay for compute, not per request. At high volume, this is dramatically cheaper.
• Full data privacy — Data never leaves your infrastructure. Critical for regulated industries.
• No vendor dependency — The model doesn’t change unless you change it. No surprise updates.
• Full customization — Fine-tune, modify, combine with other models, run any way you want.

Challenges:
• You own the infrastructure (GPUs, serving, scaling, monitoring)
• You own security, patching, and reliability
• Open-weight models are often 6–18 months behind frontier closed models in raw capability

Custom Models from Scratch

Training a model entirely from scratch on your own data. This is the most expensive option and only makes sense in specific scenarios:

• Highly specialized domain where general models fail (scientific research, proprietary data formats, niche languages)
• Massive proprietary dataset that creates a genuine competitive advantage
• Regulatory requirements that prohibit using any external model
• AI is the core product and model quality is the primary differentiator

Cost reality: Training a competitive LLM from scratch costs $1M–$100M+. Training a specialized ML model costs $200K–$2M+. This is not a decision to make lightly.

The cost crossover: At low volume (<10K requests/day), APIs are cheaper. At medium volume (10K–100K), fine-tuning or open-weight models break even. At high volume (>100K requests/day), self-hosted models can be 5–10x cheaper than APIs. Model the economics at your expected scale before deciding.

shopping_cart

Option 4: Buy Vertical SaaS

When someone else has already solved your problem

When Buying Makes Sense

Sometimes the best AI product decision is to not build AI at all and buy an existing solution:

• The problem is well-solved — Customer support AI, document processing, fraud detection — mature vendors exist with years of training data and domain expertise.
• AI is not your differentiator — If you’re an e-commerce company, building your own fraud detection from scratch makes no sense when Stripe Radar exists.
• Speed matters more than control — A vendor solution deployed in 2 weeks beats a custom solution deployed in 6 months.
• You lack AI talent — Building AI requires specialized skills. If you don’t have them and can’t hire them, buying is pragmatic.

Vendor Evaluation Criteria

1. Model quality: Request evaluation on your data, not their demo data. Every vendor looks great on cherry-picked examples.

2. Data handling: Where does your data go? Is it used to train their model? Can you get it back? What happens if you leave?

3. Customization depth: Can you adjust thresholds, add domain terminology, fine-tune on your data? Or is it one-size-fits-all?

4. Integration effort: How does it connect to your systems? API? Webhook? Manual export? The integration cost often exceeds the subscription cost.

5. Pricing model: Per-seat, per-query, per-outcome? Model the cost at 2x and 10x your current volume. Some pricing models become prohibitive at scale.

6. Exit strategy: What happens if the vendor raises prices 3x, gets acquired, or shuts down? How portable is your data and configuration?

The buy-then-build pattern: The most common successful pattern: buy a vendor solution to validate the use case and learn what matters. Once you understand the requirements deeply, decide whether to build a custom solution or stay with the vendor. Buying first de-risks the investment and accelerates learning.

lock

The Lock-In Problem

How AI vendor dependency sneaks up on you — and how to mitigate it

Where Lock-In Happens

AI lock-in is sneakier than traditional software lock-in. It doesn’t come from the model itself (models are increasingly commoditized). It comes from:

Prompt lock-in: You’ve spent months optimizing prompts for GPT-4. Switching to Claude means rewriting and re-evaluating every prompt. The prompts are your product logic — they’re not portable.

Evaluation lock-in: Your evaluation framework is tuned to one model’s behavior. A new model may fail your tests not because it’s worse, but because it responds differently.

Workflow lock-in: Your product design assumes specific model capabilities (function calling, JSON mode, vision). Switching models means redesigning features.

Data lock-in: Fine-tuning data formatted for one provider’s API. Training data stored in a vendor’s platform. Evaluation datasets tied to vendor-specific tools.

Mitigation Strategies

1. Abstract the model layer.
Build an abstraction layer between your product code and the model API. Switching models should require changing a config file, not rewriting your application.

2. Multi-model evaluation.
Regularly test your prompts and use cases against 2–3 models. Know which alternatives work and how they compare. This takes hours, not weeks.

3. Own your evaluation data.
Keep your evaluation datasets, rubrics, and test cases in your own systems, not in a vendor platform. This is your most portable asset.

4. Negotiate data rights.
Ensure your contract explicitly states: your data is yours, it’s not used to train their models (unless you consent), and you can export everything if you leave.

5. Budget for switching costs.
Assume you’ll switch models at least once in the next 18 months. The AI landscape changes too fast for permanent commitments.

The switching reality: Model capabilities improve and prices drop every quarter. The provider that’s best today may not be best in 6 months. Design for portability from day one. The cost of abstracting the model layer is small compared to the cost of being locked into an increasingly uncompetitive or expensive provider.

lightbulb

The Hybrid Approach

Most successful AI products combine multiple options

Real-World Hybrid Architectures

The best AI products rarely use a single approach. They mix and match based on the requirements of each component:

Example: AI Customer Service Platform
• Intent classification: Custom ML model (high volume, needs speed, well-defined task)
• Response generation: Fine-tuned LLM (needs company-specific tone and knowledge)
• Sentiment analysis: Foundation model API (general-purpose, low volume)
• Ticket routing: Rules engine (deterministic, no AI needed)

Example: AI-Powered Legal Research
• Document search: Open-weight embedding model (data privacy, high volume)
• Answer generation: Fine-tuned LLM (domain-specific accuracy critical)
• Citation verification: Custom model (specialized task, no existing solution)
• Summarization: Foundation model API (general-purpose, acceptable quality)

The Progressive Investment Pattern

The most successful teams follow a progressive investment pattern:

Phase 1: Validate with APIs (weeks)
Use foundation model APIs to build a prototype. Test with real users. Validate that the use case works and users want it.

Phase 2: Optimize with fine-tuning (months)
Once validated, fine-tune for quality and cost. Reduce per-query costs. Improve domain-specific performance.

Phase 3: Own with self-hosted models (quarters)
For high-volume, high-value use cases, migrate to open-weight or custom models. Eliminate per-query costs. Gain full control.

Not every component reaches Phase 3. Low-volume, non-critical features may stay on APIs forever. The investment level matches the strategic importance.

The golden rule: Start with the cheapest, fastest option that meets your quality bar. Invest more only when you have evidence that the investment is justified. Every dollar spent on custom AI before validating the use case is a dollar at risk. Validate first, invest second.

decision

The Decision Framework

Five questions that determine your build-buy-API strategy

The Five Questions

1. Is AI your core differentiator?
If yes → lean toward build/fine-tune (you need to own the intelligence).
If no → lean toward buy/API (rent the intelligence, focus on your actual differentiator).

2. How sensitive is your data?
Highly sensitive (healthcare, finance, government) → self-hosted or on-premise.
Moderate sensitivity → API with data processing agreements.
Low sensitivity → any option works.

3. What’s your volume?
<10K requests/day → APIs are cheapest.
10K–100K → fine-tuning or open-weight models break even.
>100K → self-hosted models are 5–10x cheaper.

4. How fast do you need to ship?
Days–weeks → SaaS or API.
Months → fine-tuning or open-weight.
6+ months → custom model (only if justified).

5. Do you have AI talent?
No ML team → buy or API. Don’t build what you can’t maintain.
Small ML team → fine-tune or open-weight.
Strong ML team → any option is viable.

Decision Shortcuts

If in doubt, start with APIs. You can always move down the spectrum later. You can’t get back the months spent building custom models for a use case that didn’t work.

If data privacy is non-negotiable, skip APIs entirely. Go straight to open-weight models with on-premise deployment.

If cost is the primary concern, calculate the crossover point. At what volume does self-hosting become cheaper than APIs? Build the business case with real numbers.

If you’re a startup, almost always start with APIs. Speed to market and capital efficiency matter more than model ownership in the early stages.

If you’re an enterprise, the hybrid approach is almost always correct. Different use cases warrant different investment levels.

The bottom line: Build vs. buy is not a one-time decision. It’s a portfolio strategy. Each AI capability in your product may sit at a different point on the spectrum. The best PMs evaluate each component independently, start lean, and invest progressively as the use case proves its value. The worst PMs commit to building everything custom before validating anything.

arrow_back Ch 6: Data Discovery & Feasibility Ch 8: Writing AI Product Specs arrow_forward