Ch 4: Tiny Model Architectures That Work

Ch 4 — Tiny Model Architectures That Work

Choose architecture families that survive strict memory and compute budgets.

Index ← Prev Next →

Modeling

category

Families

arrow_forward

balance

Tradeoffs

arrow_forward

rule

Fit

arrow_forward

model_training

Train

arrow_forward

task_alt

Select

Click play or press Space to begin the chapter walkthrough...

Step- / 7

apps

Architecture Families

Compact CNN and depthwise designs remain dominant for TinyML.

Common Families

Depthwise-separable CNN variants and MobileNet-style blocks are common because they reduce multiply cost while preserving useful representational power. For some signal tasks, small residual networks remain competitive when memory allows.

Task Alignment

Vision trigger tasks and keyword spotting often have different receptive-field needs and preprocessing assumptions. Treat architecture choice as task-specific rather than forcing a single model family everywhere.

Practical Pattern

Maintain a small architecture ladder with known budget characteristics and reuse it across projects. Standardized ladders reduce decision noise and speed experimentation.

Note: Key Point: Prefer architecture families with predictable latency behavior on target hardware.

hub

Task-to-Model Fit

Match model complexity to error-cost profile and operating budget.

Error-Cost Mapping

When false alarms are expensive, prioritize precision and robust negative separation; when misses are costly, optimize recall with calibrated thresholds. Architecture and loss choices should reflect this business tradeoff directly.

Complexity Control

Adding layers can improve offline accuracy but may violate latency and energy limits. Keep a strict model-size ladder and promote only when gains justify operational cost.

Failure Pattern

Benchmark-driven architecture selection can ignore deployment constraints and produce models that cannot pass firmware budgets. Always cross-check architecture choices against real device limits.

Note: Key Point: Model fit is a multi-objective optimization problem, not a single accuracy race.

straighten

Context and Receptive Field

Receptive field design determines what temporal or spatial context the model sees.

Context Window

Models that cannot observe enough context underperform on events that unfold over time or space. Increasing receptive field must be balanced against compute and memory growth.

Feature Robustness

Robust front-end features and carefully tuned normalization can reduce the need for larger models. This is often the most efficient way to improve field stability under noisy inputs.

Validation Signal

Evaluate architectural variants with shared edge-case suites and threshold stress tests. This reveals stability differences that aggregate metrics hide.

Note: Key Point: Context sufficiency should be validated with targeted edge-case evaluation, not assumed.

experiment

Training for Deployability

Train with deployment constraints in mind from the start.

Constraint-Aware Training

Use input shapes, quantization paths, and preprocessing that mirror deployment settings during training. Late-stage mismatch between training and runtime assumptions is a common source of regressions.

Stability Signals

Track calibration behavior, confidence distribution, and class-wise confusion during training. These signals help identify models that look accurate but are unstable after thresholding on-device.

Governance Rule

Require architecture promotion decisions to include both quality deltas and operational cost deltas. This prevents accidental complexity creep.

Note: Key Point: A deployable model is one whose training assumptions match production conditions.

format_list_numbered

Architecture Selection Playbook

Use a repeatable funnel to avoid random iteration.

Funnel Step 1-2

Start with a small architecture shortlist that already fits raw memory and latency limits, then run shared evaluation sets across all candidates. This narrows options quickly without over-investing in one design too early.

Funnel Step 3-4

Apply compression and threshold tuning only to finalists, then validate under device-level load tests and environmental variance. Promote the smallest model that meets quality and reliability gates.

Handoff Artifact

Publish architecture rationale documents with task assumptions and failure boundaries for future model maintainers. Review it at each release checkpoint so assumptions remain current.

Note: Key Point: A disciplined funnel usually outperforms ad-hoc architecture experimentation.

priority_high

Architecture Misfit Scenarios

Misfit usually appears as latency instability or weak edge-case behavior.

Misfit Symptoms

Symptoms include unstable threshold behavior, sensitivity to minor preprocessing drift, and unacceptable tail latency on target hardware. These indicate architecture-task mismatch even when top-line metrics look acceptable.

Correction Strategy

When misfit is detected, prefer architecture simplification or feature redesign before adding training complexity. Simpler models with robust inputs are often more reliable on constrained devices.

Note: Key Point: Architecture correction is often the fastest path to reliable edge behavior.

checklist

Selection Checklist

Use objective checks before committing to architecture freeze.

Checklist Items

Validate memory fit, latency envelope, calibration stability, and hard-negative behavior for every candidate. Require evidence from on-device tests rather than host-only benchmarks.

Freeze Decision

Freeze only when one candidate is clearly best across quality and operations gates with measurable margin. Clear freeze criteria reduce churn during compression and integration stages.

Note: Key Point: Architecture freeze should be evidence-led and difficult to reverse for good reason.