Ch 10: The MLOps Stack — MLOps & LLMOps

Ch 10 — The MLOps Stack

End-to-end platforms (SageMaker, Vertex AI, Databricks), open-source stacks, and choosing your architecture

Index

High Level

storage

Data

arrow_forward

model_training

Train

arrow_forward

inventory_2

Registry

arrow_forward

rocket_launch

Deploy

arrow_forward

monitoring

Monitor

arrow_forward

hub

Platform

Click play or press Space to begin...

Step- / 8

layers

The MLOps Stack Layers

Understanding the full picture before choosing tools

Stack Architecture

An MLOps stack spans six layers, each with multiple tool choices: Data layer — data versioning (DVC, LakeFS), feature stores (Feast, Tecton), data quality (Great Expectations). Training layer — experiment tracking (MLflow, W&B), orchestration (Kubeflow, Airflow, Prefect), compute (cloud GPUs, Kubernetes). Registry layer — model registry (MLflow, SageMaker), model cards, artifact storage. Deployment layer — serving (Triton, vLLM, BentoML), CI/CD (GitHub Actions, CML), deployment strategies (canary, blue-green). Monitoring layer — drift detection (Evidently, NannyML), performance monitoring (Prometheus, Grafana), LLM observability (Langfuse). Platform layer — end-to-end platforms that bundle multiple layers (SageMaker, Vertex AI, Databricks).

Stack Overview

// The MLOps stack — 6 layers Layer 1: Data Versioning: DVC, LakeFS, Delta Lake Features: Feast, Tecton Quality: Great Expectations, Soda Layer 2: Training Tracking: MLflow, Weights & Biases Orchestrate: Kubeflow, Airflow, Prefect Compute: K8s + GPUs, cloud instances Layer 3: Registry Models: MLflow Registry, SageMaker Artifacts: S3/GCS + metadata store Layer 4: Deployment Serving: Triton, vLLM, BentoML CI/CD: GitHub Actions, CML Strategy: Canary, blue-green, shadow Layer 5: Monitoring Drift: Evidently, NannyML Infra: Prometheus + Grafana LLM: Langfuse, LangSmith Layer 6: Platform End-to-end: SageMaker, Vertex, Databricks

Key insight: You don’t need to fill every layer on day one. Start with experiment tracking (MLflow) and a model registry. Add layers as your maturity grows. The biggest mistake is over-engineering the stack before you have a model in production.

cloud

AWS SageMaker

The most comprehensive managed ML platform

SageMaker Overview

Amazon SageMaker is the most mature end-to-end ML platform, covering the entire lifecycle: data labeling (Ground Truth), notebook environments (Studio), training (managed training jobs with spot instances), experiment tracking, model registry, deployment (real-time endpoints, batch transform, serverless inference), and monitoring (Model Monitor). Strengths: deepest AWS integration (S3, IAM, VPC, CloudWatch), widest compute options (p5 instances with H100 GPUs, Inferentia/Trainium custom chips), mature feature set (model cards, bias detection with Clarify, feature store), and SageMaker Pipelines for ML workflow orchestration. Best for: teams already on AWS who want a single-vendor solution with enterprise security (VPC isolation, KMS encryption).

SageMaker Stack

// AWS SageMaker — end-to-end Data: S3 + SageMaker Feature Store Ground Truth (labeling) Data Wrangler (prep) Training: SageMaker Training Jobs Built-in algorithms + custom Spot instances (70% savings) SageMaker Experiments (tracking) Registry: SageMaker Model Registry Model Cards (documentation) Clarify (bias detection) Deployment: Real-time endpoints Serverless inference Batch transform Multi-model endpoints Monitoring: SageMaker Model Monitor CloudWatch integration Best for: AWS-native teams Weakness: Vendor lock-in, complexity

Key insight: SageMaker’s biggest advantage is also its biggest risk: deep AWS integration means high productivity on AWS but significant lock-in. Migrating a SageMaker pipeline to another cloud is a major effort.

cloud

Google Vertex AI

Developer-friendly ML platform with BigQuery integration

Vertex AI Overview

Google Vertex AI prioritizes developer ergonomics and data-centric workflows. It integrates tightly with BigQuery (query data directly into training), Vertex AI Feature Store (managed, low-latency feature serving), and Vertex AI Pipelines (based on Kubeflow Pipelines). Strengths: faster iteration (less boilerplate than SageMaker), strong AutoML (best-in-class for tabular, image, and text), Model Garden (one-click deployment of open models like Gemma, Llama), and Vertex AI Agent Builder (build and deploy AI agents with grounding). Best for: teams on GCP, data-heavy workloads with BigQuery, and teams that want fast experimentation with lower operational friction.

Vertex AI Stack

// Google Vertex AI — end-to-end Data: BigQuery (warehouse) Vertex AI Feature Store Dataflow (streaming) Training: Custom training jobs AutoML (tabular, vision, text) Vertex AI Experiments (tracking) TPU v5e / A3 GPU instances Registry: Vertex AI Model Registry Model Garden (open models) Deployment: Vertex AI Endpoints Batch prediction Model Garden one-click deploy Monitoring: Vertex AI Model Monitoring Cloud Monitoring integration Best for: GCP teams, BigQuery users Weakness: Tighter BigQuery coupling

Key insight: Vertex AI’s Model Garden is a standout feature: deploy Gemma, Llama, Mistral, and other open models with one click, with automatic scaling and GPU provisioning. It’s the fastest way to get an open model into production.

database

Databricks Mosaic AI

Unified lakehouse for data + ML + AI

Databricks Overview

Databricks Mosaic AI provides a unified environment for data engineering, ML, and AI on top of the lakehouse architecture (Delta Lake). Strengths: Unity Catalog (unified governance for data, models, and features), MLflow integration (Databricks created MLflow, so integration is seamless), native vector search (for RAG applications), compound AI systems (build agents with models, retrievers, and tools), and multi-cloud (runs on AWS, Azure, and GCP). Best for: large enterprises with data-heavy workloads, teams that want data engineering and ML on the same platform, and organizations that need consistent governance across data and models.

Databricks Stack

// Databricks Mosaic AI — end-to-end Data: Delta Lake (lakehouse storage) Unity Catalog (governance) Feature Engineering (tables) Training: MLflow (native integration) AutoML Distributed training (Spark) GPU clusters Registry: MLflow Model Registry Unity Catalog Models Model lineage tracking Deployment: Model Serving (real-time) Foundation Model APIs Vector Search (RAG) AI: Mosaic AI Agent Framework Compound AI systems Guardrails + evaluation Best for: Data-heavy enterprises Weakness: Cost at scale, complexity

Key insight: Databricks’ unique advantage is the lakehouse: data engineers and ML engineers work on the same platform with the same governance. No data copying between warehouses and ML tools. This eliminates a major source of friction and inconsistency.

code

The Open-Source Stack

MLflow + Kubeflow + Evidently + friends

Open-Source Approach

Instead of a managed platform, you can build an MLOps stack from open-source components: MLflow (experiment tracking + model registry — the universal default), Kubeflow Pipelines (ML workflow orchestration on Kubernetes, with per-step artifact tracking and GPU scheduling), DVC (data versioning, works like Git for data), Feast (feature store, offline + online serving), Evidently (monitoring + drift detection), BentoML or KServe (model serving), and Great Expectations (data quality). Advantages: no vendor lock-in, full control, cost-effective (no platform fees). Disadvantages: integration burden (you glue everything together), operational overhead (you manage the infrastructure), and slower to start.

Open-Source Stack

// Recommended open-source MLOps stack Experiment Tracking: MLflow // Universal default, no lock-in Orchestration: Kubeflow Pipelines // ML-native, GPU-aware, K8s // Alt: Airflow (if already using) Data Versioning: DVC // Git for data, S3/GCS backend Feature Store: Feast // Offline + online serving Model Serving: BentoML or KServe // BentoML: simpler, Python-first // KServe: K8s-native, more features Monitoring: Evidently + Prometheus // Drift + infra metrics Data Quality: Great Expectations // Schema + distribution checks LLM Gateway: LiteLLM // If using LLMs

Key insight: The open-source stack is best for teams with strong engineering capacity and Kubernetes experience. If you don’t have a platform team, the operational overhead of managing 6+ open-source tools can outweigh the cost savings.

compare_arrows

Platform Decision Framework

How to choose the right stack for your team

Decision Criteria

The right stack depends on your context, not on feature comparisons. Key decision factors: Cloud strategy — single-cloud? Use the native platform (SageMaker/Vertex). Multi-cloud? Use Databricks or open-source. Team size and skills — small team? Use a managed platform. Large platform team? Open-source gives more control. Data architecture — already on BigQuery? Vertex AI. Lakehouse? Databricks. S3-heavy? SageMaker. Compliance requirements — strict audit needs? Managed platforms have built-in compliance. Budget — platform fees vs. engineering time. Speed vs. control — need to ship fast? Managed. Need full customization? Open-source. The hardest production challenges are typically data contracts and change management, not the platform itself.

Decision Matrix

// Platform decision matrix Choose SageMaker if: ✓ All-in on AWS ✓ Need widest compute options ✓ Enterprise security (VPC, KMS) ✓ Large team, complex workflows Choose Vertex AI if: ✓ GCP / BigQuery user ✓ Want fast iteration ✓ Need AutoML ✓ Deploying open models Choose Databricks if: ✓ Data-heavy workloads ✓ Need unified data + ML ✓ Multi-cloud requirement ✓ Already using Spark/Delta Choose Open-Source if: ✓ Multi-cloud / on-prem ✓ Strong K8s team ✓ Need full control ✓ Cost-sensitive

Key insight: Don’t choose a platform based on feature lists. Choose based on where your data already lives, what cloud you’re on, and how much engineering capacity you have. Migration costs are real — pick the platform that fits your existing infrastructure.

stairs

Building Your Stack Incrementally

Start simple, add layers as you mature

Incremental Approach

Don’t build the full stack on day one. Start with the minimum viable MLOps and add layers as your needs grow: Phase 1 (Week 1–2): Experiment tracking (MLflow) + model registry. This alone prevents the “which model was best?” problem. Phase 2 (Month 1): Add CI/CD for model training and deployment. Automate the path from code to production. Phase 3 (Month 2–3): Add monitoring (Evidently) and data quality checks (Great Expectations). Catch drift before users do. Phase 4 (Month 3–6): Add feature store, advanced serving (canary deployments), and continuous training. Phase 5 (Month 6+): Add LLMOps layer (gateway, prompt management, guardrails) if using LLMs. Each phase should deliver value before moving to the next.

Phased Rollout

// Incremental MLOps stack build Phase 1 (Week 1-2): ✓ MLflow tracking server ✓ Model registry // "Which model is best?" solved Phase 2 (Month 1): ✓ CI/CD pipeline (GitHub Actions) ✓ Automated training + deploy // "How do I ship a model?" solved Phase 3 (Month 2-3): ✓ Evidently monitoring ✓ Great Expectations data checks ✓ Alerting (PagerDuty/Slack) // "Is my model still good?" solved Phase 4 (Month 3-6): ✓ Feature store (Feast) ✓ Canary deployments ✓ Continuous training // "How do I scale?" solved Phase 5 (Month 6+): ✓ LLM gateway (LiteLLM) ✓ Prompt management ✓ Guardrails // "How do I manage LLMs?" solved

Key insight: Each phase should solve a specific pain point. If you’re not feeling the pain of missing monitoring, don’t add it yet. Premature optimization of your MLOps stack is as wasteful as premature optimization of code.

map

Course Recap & What’s Next

Tying it all together

Course Journey

Over 10 chapters, we’ve covered the full MLOps & LLMOps lifecycle: Ch 1–2: Why MLOps matters and experiment tracking (the foundation). Ch 3–4: Model registry, data versioning, data pipelines, and feature stores (the data layer). Ch 5–6: CI/CD for ML and model serving (the deployment layer). Ch 7–8: LLMOps — gateways, routing, prompt management, evaluation, and guardrails (the LLM layer). Ch 9: Monitoring and drift detection (the feedback loop). Ch 10: The full MLOps stack and how to choose and build it incrementally. The key theme: start simple, automate incrementally, and always close the feedback loop from production back to training.

Key Takeaways

// MLOps & LLMOps — key takeaways 1. Technical debt in ML is real. MLOps is how you manage it. 2. Track everything: experiments, data versions, model versions. 3. Automate the path from code to production (CI/CD for ML). 4. Monitor at 3 levels: infra, model, and business metrics. 5. LLMOps adds new challenges: gateways, prompts, guardrails. 6. Start simple. MLflow + CI/CD covers 80% of needs. 7. The hardest problems are organizational, not technical. 8. Close the feedback loop: production → monitoring → retrain

Key insight: The most successful MLOps teams aren’t the ones with the most tools — they’re the ones that close the feedback loop fastest. Get a model to production quickly, monitor it, learn from it, and improve. Speed of iteration beats perfection of infrastructure.

arrow_back Ch 9: Monitoring & Drift Detection Back to Course Index home