Ch 4 — Tiny Model Architectures That Work

Choose architecture families that survive strict memory and compute budgets.
Modeling
category
Families
arrow_forward
balance
Tradeoffs
arrow_forward
rule
Fit
arrow_forward
model_training
Train
arrow_forward
task_alt
Select
-
Click play or press Space to begin the chapter walkthrough...
Step- / 7
apps
Architecture Families
Compact CNN and depthwise designs remain dominant for TinyML.
Common Families
Depthwise-separable CNN variants and MobileNet-style blocks are common because they reduce multiply cost while preserving useful representational power. For some signal tasks, small residual networks remain competitive when memory allows.
Task Alignment
Vision trigger tasks and keyword spotting often have different receptive-field needs and preprocessing assumptions. Treat architecture choice as task-specific rather than forcing a single model family everywhere.
Practical Pattern
Maintain a small architecture ladder with known budget characteristics and reuse it across projects. Standardized ladders reduce decision noise and speed experimentation.
Note: Key Point: Prefer architecture families with predictable latency behavior on target hardware.
hub
Task-to-Model Fit
Match model complexity to error-cost profile and operating budget.
Error-Cost Mapping
When false alarms are expensive, prioritize precision and robust negative separation; when misses are costly, optimize recall with calibrated thresholds. Architecture and loss choices should reflect this business tradeoff directly.
Complexity Control
Adding layers can improve offline accuracy but may violate latency and energy limits. Keep a strict model-size ladder and promote only when gains justify operational cost.
Failure Pattern
Benchmark-driven architecture selection can ignore deployment constraints and produce models that cannot pass firmware budgets. Always cross-check architecture choices against real device limits.
Note: Key Point: Model fit is a multi-objective optimization problem, not a single accuracy race.
straighten
Context and Receptive Field
Receptive field design determines what temporal or spatial context the model sees.
Context Window
Models that cannot observe enough context underperform on events that unfold over time or space. Increasing receptive field must be balanced against compute and memory growth.
Feature Robustness
Robust front-end features and carefully tuned normalization can reduce the need for larger models. This is often the most efficient way to improve field stability under noisy inputs.
Validation Signal
Evaluate architectural variants with shared edge-case suites and threshold stress tests. This reveals stability differences that aggregate metrics hide.
Note: Key Point: Context sufficiency should be validated with targeted edge-case evaluation, not assumed.
experiment
Training for Deployability
Train with deployment constraints in mind from the start.
Constraint-Aware Training
Use input shapes, quantization paths, and preprocessing that mirror deployment settings during training. Late-stage mismatch between training and runtime assumptions is a common source of regressions.
Stability Signals
Track calibration behavior, confidence distribution, and class-wise confusion during training. These signals help identify models that look accurate but are unstable after thresholding on-device.
Governance Rule
Require architecture promotion decisions to include both quality deltas and operational cost deltas. This prevents accidental complexity creep.
Note: Key Point: A deployable model is one whose training assumptions match production conditions.
format_list_numbered
Architecture Selection Playbook
Use a repeatable funnel to avoid random iteration.
Funnel Step 1-2
Start with a small architecture shortlist that already fits raw memory and latency limits, then run shared evaluation sets across all candidates. This narrows options quickly without over-investing in one design too early.
Funnel Step 3-4
Apply compression and threshold tuning only to finalists, then validate under device-level load tests and environmental variance. Promote the smallest model that meets quality and reliability gates.
Handoff Artifact
Publish architecture rationale documents with task assumptions and failure boundaries for future model maintainers. Review it at each release checkpoint so assumptions remain current.
Note: Key Point: A disciplined funnel usually outperforms ad-hoc architecture experimentation.
priority_high
Architecture Misfit Scenarios
Misfit usually appears as latency instability or weak edge-case behavior.
Misfit Symptoms
Symptoms include unstable threshold behavior, sensitivity to minor preprocessing drift, and unacceptable tail latency on target hardware. These indicate architecture-task mismatch even when top-line metrics look acceptable.
Correction Strategy
When misfit is detected, prefer architecture simplification or feature redesign before adding training complexity. Simpler models with robust inputs are often more reliable on constrained devices.
Note: Key Point: Architecture correction is often the fastest path to reliable edge behavior.
checklist
Selection Checklist
Use objective checks before committing to architecture freeze.
Checklist Items
Validate memory fit, latency envelope, calibration stability, and hard-negative behavior for every candidate. Require evidence from on-device tests rather than host-only benchmarks.
Freeze Decision
Freeze only when one candidate is clearly best across quality and operations gates with measurable margin. Clear freeze criteria reduce churn during compression and integration stages.
Note: Key Point: Architecture freeze should be evidence-led and difficult to reverse for good reason.