Ch 2: Hardware Budgets: RAM, Flash, Cycles, Power

Ch 2 — Hardware Budgets: RAM, Flash, Cycles, Power

Design from constraints first, then fit model and runtime choices inside them.

Index ← Prev Next →

Foundation

memory

Memory

arrow_forward

timer

Latency

arrow_forward

bolt

Power

arrow_forward

thermostat

Thermal

arrow_forward

calculate

Budget

Click play or press Space to begin the chapter walkthrough...

Step- / 7

memory

Memory Budgets

RAM and flash limits are usually the first hard constraint in TinyML.

RAM Reality

Inference memory must include model tensors, activation buffers, input queues, and firmware overhead. A model that appears to fit in isolation can still fail once networking, logging, and safety tasks are included.

Flash Planning

Flash holds firmware, model artifacts, configuration, and rollback images for OTA updates. Reserve explicit space for update safety so release operations do not become blocked by storage exhaustion.

Practical Pattern

Build a single shared budget table for model, firmware, and platform teams so tradeoffs are visible to everyone. Separate private estimates create hidden integration risk.

Note: Key Point: Budget memory for the full device workload, not only the model binary.

speed

Compute and Latency Budgets

Cycle budgets define what model complexity is feasible per inference window.

Cycle Envelope

Translate latency targets into per-inference cycle budgets using your MCU clock and scheduling model. This avoids selecting architectures that cannot meet timing once integrated with real firmware tasks.

Tail Behavior

Average latency is not enough for real-time systems; tail latency determines missed deadlines and user-visible failures. Measure p95 and p99 timing under realistic input bursts and background load.

Failure Pattern

A common mistake is budgeting to average-case behavior while ignoring burst conditions and startup spikes. Tail conditions should drive pass/fail decisions.

Note: Key Point: A model is acceptable only if worst-case latency stays inside product deadlines.

battery_charging_full

Power and Energy Budgets

Battery life is governed by duty cycle, wake time, and compute intensity.

Duty Cycle Model

Always-on systems often rely on low-power front-end stages and wake higher-cost inference only when needed. Designing this trigger pipeline carefully can reduce energy use without sacrificing detection quality.

Energy per Decision

Track energy per inference and inferences per hour to estimate battery life at the product level. This makes design reviews concrete and exposes hidden costs from aggressive sampling or over-frequent model execution.

Validation Signal

Measure budget usage on production-like firmware images rather than minimal benchmark builds. Supporting services can materially change memory and latency behavior.

Note: Key Point: Power budgeting must be expressed as daily energy consumption, not just peak current.

device_thermostat

Thermal and Stability Limits

Sustained edge workloads can trigger thermal throttling and unstable timing.

Thermal Drift

As devices heat, clocks may throttle and latency can drift outside previously safe envelopes. Long-duration soak tests reveal these effects better than short bench runs.

Environmental Range

Field deployments face temperature and voltage variations that are absent in lab conditions. Validate performance across expected environmental ranges before declaring production readiness.

Governance Rule

Set explicit safety margins for RAM, flash, and timing rather than targeting exact limits. Margin policies protect reliability when workloads shift after deployment.

Note: Key Point: Thermal reliability is part of model validation in edge deployments.

fact_check

Budget Worksheet and Gates

Use explicit pass/fail gates before training and before deployment.

Pre-Training Gate

Set target ceilings for RAM, flash, p95 latency, and daily energy before model iteration begins. This prevents teams from optimizing toward models that cannot ever be operationally viable.

Pre-Release Gate

Release only when measured device metrics meet budget with safety margin under realistic traffic and firmware load. A margin policy protects reliability when workloads drift after launch.

Handoff Artifact

Keep versioned budget snapshots linked to model versions so regressions can be detected quickly when new releases are proposed. Review it at each release checkpoint so assumptions remain current.

Note: Key Point: Budget gates convert edge constraints into objective engineering decisions.

error

Budget Overrun Patterns

Most overruns are discovered late because constraints are tracked informally.

Typical Overruns

Frequent overruns include memory fragmentation during long uptime, flash exhaustion after OTA requirements are added, and latency spikes from concurrent tasks. These are system-level issues, not just model issues.

Containment Strategy

Add automated budget regression checks in CI for firmware builds and benchmark harnesses. Catching regressions per commit is far cheaper than debugging integrated failures near release.

Note: Key Point: Budget regressions should be treated like functional regressions in your release process.

rule_settings

Budget Review Rhythm

Regular budget reviews keep decisions grounded as product requirements evolve.

Review Inputs

Review updated model metrics, firmware task loads, and expected traffic scenarios together. Budget accuracy degrades quickly when any one of these changes without cross-team visibility.

Decision Output

Each review should produce either approval, mitigation tasks, or scope reduction decisions. Explicit outcomes keep teams aligned and prevent hidden budget debt from accumulating.

Note: Key Point: Budget governance is a continuous process, not a one-time spreadsheet exercise.