Ch 7 — Build vs. Buy vs. API

Foundation model APIs, fine-tuning, custom models, or vertical SaaS? The decision matrix.
High Level
view_list
Spectrum
arrow_forward
api
API
arrow_forward
tune
Fine-Tune
arrow_forward
build
Custom
arrow_forward
lock
Lock-In
arrow_forward
decision
Decide
-
Click play or press Space to begin...
Step- / 8
view_list
The Build-Buy Spectrum
It’s not binary — there are five distinct options with different trade-offs
Five Options, Not Two
The “build vs. buy” framing is misleading. In AI, there’s a spectrum of five options, each with different cost, speed, control, and differentiation profiles:

1. Buy SaaS — Purchase a complete AI product (e.g., Intercom Fin for support, Gong for sales intelligence). Fastest, least differentiated.

2. Use Foundation Model APIs — Call OpenAI, Anthropic, or Google APIs directly. Build the product layer yourself, rent the intelligence.

3. Fine-tune a Foundation Model — Take a pre-trained model and customize it on your domain data. More investment, more differentiation.

4. Deploy Open-Weight Models — Run Llama, Mistral, or similar on your own infrastructure. Full control, no per-query costs, but significant ops burden.

5. Train Custom Models — Build from scratch on your data. Maximum control and differentiation. Maximum cost and timeline.
The Trade-Off Matrix
Speed to market:
SaaS (days) > API (weeks) > Fine-tune (months) > Open-weight (months) > Custom (6–12+ months)

Year 1 cost:
SaaS ($6K–$600K) ≈ API ($10K–$200K) < Fine-tune ($50K–$300K) < Open-weight ($80K–$400K) < Custom ($200K–$600K+)

Differentiation:
Custom > Open-weight > Fine-tune > API > SaaS

Data control:
Custom = Open-weight > Fine-tune > API > SaaS

Maintenance burden:
Custom > Open-weight > Fine-tune > API > SaaS
The key insight: You’re not choosing between “renting intelligence” and “owning intelligence.” You’re choosing between renting model behavior (fast, cheap, dependent) and owning model outcomes (slow, expensive, independent). The right choice depends on whether AI is your core differentiator or a supporting capability.
api
Option 1: Foundation Model APIs
The fastest path — rent intelligence from OpenAI, Anthropic, Google, and others
How It Works
You call a foundation model API (GPT-4o, Claude, Gemini) with your prompt, context, and parameters. The model returns a response. You build the product layer — UI, business logic, data pipelines, evaluation — around the API.

What you own: The product experience, the prompt engineering, the evaluation framework, the user data, the business logic.

What you rent: The model intelligence. You have no control over model updates, pricing changes, or capability shifts. When OpenAI releases a new version, your product behavior may change overnight.
When to Choose APIs
• AI is a feature, not the core product (adding summarization to a project management tool)
• You need to ship fast and validate the concept before investing heavily
• The task is general-purpose (writing, summarization, Q&A) not domain-specific
• Your team lacks ML expertise and you need to start somewhere
Volume is low to moderate (under 100K requests/day)
Risks & Costs
Per-query cost: $0.01–$0.50+ per request depending on model and token count. At 1M requests/day, that’s $10K–$500K/month. Costs scale linearly with usage.

Latency: API calls add 500ms–5s of latency. For real-time applications, this may be unacceptable.

Data privacy: Your data (prompts, context) is sent to a third-party server. For sensitive data (healthcare, finance, legal), this may violate compliance requirements. Some providers offer data processing agreements, but the data still leaves your infrastructure.

Model changes: When the provider updates the model, your product behavior changes without your control. A prompt that worked perfectly on GPT-4 may behave differently on GPT-4o. You must regression-test after every model update.
The API starting strategy: Start with APIs to validate the concept. If the product works and scales, migrate to fine-tuning or open-weight models for cost reduction and control. APIs are the best way to learn what you need before committing to heavier investments.
tune
Option 2: Fine-Tuning
Customizing a foundation model on your domain data
How It Works
Fine-tuning takes a pre-trained foundation model and continues training it on your domain-specific data. The model learns your terminology, style, patterns, and domain knowledge while retaining its general capabilities.

Example: Fine-tuning GPT-4o on 10,000 examples of your company’s customer support conversations. The resulting model understands your products, policies, and tone of voice far better than the base model with prompting alone.

What changes vs. APIs:
• Better performance on domain-specific tasks (often 10–30% improvement)
• Shorter prompts needed (the knowledge is in the weights, not the prompt)
• Lower per-query cost (smaller model, shorter prompts)
• More consistent behavior (less sensitive to prompt variations)
When to Choose Fine-Tuning
• Prompting alone doesn’t achieve your quality threshold
• You have domain-specific data (1,000+ examples) that the base model hasn’t seen
• You need consistent style or format that’s hard to enforce via prompting
• You want to reduce per-query cost (fine-tuned smaller models can match larger models on specific tasks)
• The task is narrow and well-defined (classification, extraction, specific generation patterns)
Costs & Considerations
Training cost: $50–$5,000+ per fine-tuning run depending on model size and data volume. You’ll run multiple iterations.

Data requirement: Minimum 500–1,000 high-quality examples. More is better. The examples must be representative of production use cases.

Ongoing cost: You must re-fine-tune when the base model updates, when your domain changes, or when you add new capabilities. This is recurring, not one-time.

Evaluation burden: You need robust evaluation to ensure fine-tuning improved the right things without degrading others.
Fine-tuning trap: Don’t fine-tune prematurely. Exhaust prompt engineering and RAG first. Fine-tuning is harder to iterate on (hours per experiment vs. minutes for prompt changes). Only fine-tune when you’ve proven the use case with APIs and need the next level of quality or cost efficiency.
build
Option 3: Open-Weight & Custom Models
Maximum control, maximum responsibility
Open-Weight Models
Models like Meta’s Llama, Mistral, and others release their weights publicly. You download the model and run it on your own infrastructure (or a cloud provider).

Advantages:
No per-query API cost — You pay for compute, not per request. At high volume, this is dramatically cheaper.
Full data privacy — Data never leaves your infrastructure. Critical for regulated industries.
No vendor dependency — The model doesn’t change unless you change it. No surprise updates.
Full customization — Fine-tune, modify, combine with other models, run any way you want.

Challenges:
• You own the infrastructure (GPUs, serving, scaling, monitoring)
• You own security, patching, and reliability
• Open-weight models are often 6–18 months behind frontier closed models in raw capability
Custom Models from Scratch
Training a model entirely from scratch on your own data. This is the most expensive option and only makes sense in specific scenarios:

Highly specialized domain where general models fail (scientific research, proprietary data formats, niche languages)
Massive proprietary dataset that creates a genuine competitive advantage
Regulatory requirements that prohibit using any external model
AI is the core product and model quality is the primary differentiator

Cost reality: Training a competitive LLM from scratch costs $1M–$100M+. Training a specialized ML model costs $200K–$2M+. This is not a decision to make lightly.
The cost crossover: At low volume (<10K requests/day), APIs are cheaper. At medium volume (10K–100K), fine-tuning or open-weight models break even. At high volume (>100K requests/day), self-hosted models can be 5–10x cheaper than APIs. Model the economics at your expected scale before deciding.
shopping_cart
Option 4: Buy Vertical SaaS
When someone else has already solved your problem
When Buying Makes Sense
Sometimes the best AI product decision is to not build AI at all and buy an existing solution:

The problem is well-solved — Customer support AI, document processing, fraud detection — mature vendors exist with years of training data and domain expertise.
AI is not your differentiator — If you’re an e-commerce company, building your own fraud detection from scratch makes no sense when Stripe Radar exists.
Speed matters more than control — A vendor solution deployed in 2 weeks beats a custom solution deployed in 6 months.
You lack AI talent — Building AI requires specialized skills. If you don’t have them and can’t hire them, buying is pragmatic.
Vendor Evaluation Criteria
1. Model quality: Request evaluation on your data, not their demo data. Every vendor looks great on cherry-picked examples.

2. Data handling: Where does your data go? Is it used to train their model? Can you get it back? What happens if you leave?

3. Customization depth: Can you adjust thresholds, add domain terminology, fine-tune on your data? Or is it one-size-fits-all?

4. Integration effort: How does it connect to your systems? API? Webhook? Manual export? The integration cost often exceeds the subscription cost.

5. Pricing model: Per-seat, per-query, per-outcome? Model the cost at 2x and 10x your current volume. Some pricing models become prohibitive at scale.

6. Exit strategy: What happens if the vendor raises prices 3x, gets acquired, or shuts down? How portable is your data and configuration?
The buy-then-build pattern: The most common successful pattern: buy a vendor solution to validate the use case and learn what matters. Once you understand the requirements deeply, decide whether to build a custom solution or stay with the vendor. Buying first de-risks the investment and accelerates learning.
lock
The Lock-In Problem
How AI vendor dependency sneaks up on you — and how to mitigate it
Where Lock-In Happens
AI lock-in is sneakier than traditional software lock-in. It doesn’t come from the model itself (models are increasingly commoditized). It comes from:

Prompt lock-in: You’ve spent months optimizing prompts for GPT-4. Switching to Claude means rewriting and re-evaluating every prompt. The prompts are your product logic — they’re not portable.

Evaluation lock-in: Your evaluation framework is tuned to one model’s behavior. A new model may fail your tests not because it’s worse, but because it responds differently.

Workflow lock-in: Your product design assumes specific model capabilities (function calling, JSON mode, vision). Switching models means redesigning features.

Data lock-in: Fine-tuning data formatted for one provider’s API. Training data stored in a vendor’s platform. Evaluation datasets tied to vendor-specific tools.
Mitigation Strategies
1. Abstract the model layer.
Build an abstraction layer between your product code and the model API. Switching models should require changing a config file, not rewriting your application.

2. Multi-model evaluation.
Regularly test your prompts and use cases against 2–3 models. Know which alternatives work and how they compare. This takes hours, not weeks.

3. Own your evaluation data.
Keep your evaluation datasets, rubrics, and test cases in your own systems, not in a vendor platform. This is your most portable asset.

4. Negotiate data rights.
Ensure your contract explicitly states: your data is yours, it’s not used to train their models (unless you consent), and you can export everything if you leave.

5. Budget for switching costs.
Assume you’ll switch models at least once in the next 18 months. The AI landscape changes too fast for permanent commitments.
The switching reality: Model capabilities improve and prices drop every quarter. The provider that’s best today may not be best in 6 months. Design for portability from day one. The cost of abstracting the model layer is small compared to the cost of being locked into an increasingly uncompetitive or expensive provider.
lightbulb
The Hybrid Approach
Most successful AI products combine multiple options
Real-World Hybrid Architectures
The best AI products rarely use a single approach. They mix and match based on the requirements of each component:

Example: AI Customer Service Platform
Intent classification: Custom ML model (high volume, needs speed, well-defined task)
Response generation: Fine-tuned LLM (needs company-specific tone and knowledge)
Sentiment analysis: Foundation model API (general-purpose, low volume)
Ticket routing: Rules engine (deterministic, no AI needed)

Example: AI-Powered Legal Research
Document search: Open-weight embedding model (data privacy, high volume)
Answer generation: Fine-tuned LLM (domain-specific accuracy critical)
Citation verification: Custom model (specialized task, no existing solution)
Summarization: Foundation model API (general-purpose, acceptable quality)
The Progressive Investment Pattern
The most successful teams follow a progressive investment pattern:

Phase 1: Validate with APIs (weeks)
Use foundation model APIs to build a prototype. Test with real users. Validate that the use case works and users want it.

Phase 2: Optimize with fine-tuning (months)
Once validated, fine-tune for quality and cost. Reduce per-query costs. Improve domain-specific performance.

Phase 3: Own with self-hosted models (quarters)
For high-volume, high-value use cases, migrate to open-weight or custom models. Eliminate per-query costs. Gain full control.

Not every component reaches Phase 3. Low-volume, non-critical features may stay on APIs forever. The investment level matches the strategic importance.
The golden rule: Start with the cheapest, fastest option that meets your quality bar. Invest more only when you have evidence that the investment is justified. Every dollar spent on custom AI before validating the use case is a dollar at risk. Validate first, invest second.
decision
The Decision Framework
Five questions that determine your build-buy-API strategy
The Five Questions
1. Is AI your core differentiator?
If yes → lean toward build/fine-tune (you need to own the intelligence).
If no → lean toward buy/API (rent the intelligence, focus on your actual differentiator).

2. How sensitive is your data?
Highly sensitive (healthcare, finance, government) → self-hosted or on-premise.
Moderate sensitivity → API with data processing agreements.
Low sensitivity → any option works.

3. What’s your volume?
<10K requests/day → APIs are cheapest.
10K–100K → fine-tuning or open-weight models break even.
>100K → self-hosted models are 5–10x cheaper.

4. How fast do you need to ship?
Days–weeks → SaaS or API.
Months → fine-tuning or open-weight.
6+ months → custom model (only if justified).

5. Do you have AI talent?
No ML team → buy or API. Don’t build what you can’t maintain.
Small ML team → fine-tune or open-weight.
Strong ML team → any option is viable.
Decision Shortcuts
If in doubt, start with APIs. You can always move down the spectrum later. You can’t get back the months spent building custom models for a use case that didn’t work.

If data privacy is non-negotiable, skip APIs entirely. Go straight to open-weight models with on-premise deployment.

If cost is the primary concern, calculate the crossover point. At what volume does self-hosting become cheaper than APIs? Build the business case with real numbers.

If you’re a startup, almost always start with APIs. Speed to market and capital efficiency matter more than model ownership in the early stages.

If you’re an enterprise, the hybrid approach is almost always correct. Different use cases warrant different investment levels.
The bottom line: Build vs. buy is not a one-time decision. It’s a portfolio strategy. Each AI capability in your product may sit at a different point on the spectrum. The best PMs evaluate each component independently, start lean, and invest progressively as the use case proves its value. The worst PMs commit to building everything custom before validating anything.