Ch 2 — Hardware Budgets: RAM, Flash, Cycles, Power

Design from constraints first, then fit model and runtime choices inside them.
Foundation
memory
Memory
arrow_forward
timer
Latency
arrow_forward
bolt
Power
arrow_forward
thermostat
Thermal
arrow_forward
calculate
Budget
-
Click play or press Space to begin the chapter walkthrough...
Step- / 7
memory
Memory Budgets
RAM and flash limits are usually the first hard constraint in TinyML.
RAM Reality
Inference memory must include model tensors, activation buffers, input queues, and firmware overhead. A model that appears to fit in isolation can still fail once networking, logging, and safety tasks are included.
Flash Planning
Flash holds firmware, model artifacts, configuration, and rollback images for OTA updates. Reserve explicit space for update safety so release operations do not become blocked by storage exhaustion.
Practical Pattern
Build a single shared budget table for model, firmware, and platform teams so tradeoffs are visible to everyone. Separate private estimates create hidden integration risk.
Note: Key Point: Budget memory for the full device workload, not only the model binary.
speed
Compute and Latency Budgets
Cycle budgets define what model complexity is feasible per inference window.
Cycle Envelope
Translate latency targets into per-inference cycle budgets using your MCU clock and scheduling model. This avoids selecting architectures that cannot meet timing once integrated with real firmware tasks.
Tail Behavior
Average latency is not enough for real-time systems; tail latency determines missed deadlines and user-visible failures. Measure p95 and p99 timing under realistic input bursts and background load.
Failure Pattern
A common mistake is budgeting to average-case behavior while ignoring burst conditions and startup spikes. Tail conditions should drive pass/fail decisions.
Note: Key Point: A model is acceptable only if worst-case latency stays inside product deadlines.
battery_charging_full
Power and Energy Budgets
Battery life is governed by duty cycle, wake time, and compute intensity.
Duty Cycle Model
Always-on systems often rely on low-power front-end stages and wake higher-cost inference only when needed. Designing this trigger pipeline carefully can reduce energy use without sacrificing detection quality.
Energy per Decision
Track energy per inference and inferences per hour to estimate battery life at the product level. This makes design reviews concrete and exposes hidden costs from aggressive sampling or over-frequent model execution.
Validation Signal
Measure budget usage on production-like firmware images rather than minimal benchmark builds. Supporting services can materially change memory and latency behavior.
Note: Key Point: Power budgeting must be expressed as daily energy consumption, not just peak current.
device_thermostat
Thermal and Stability Limits
Sustained edge workloads can trigger thermal throttling and unstable timing.
Thermal Drift
As devices heat, clocks may throttle and latency can drift outside previously safe envelopes. Long-duration soak tests reveal these effects better than short bench runs.
Environmental Range
Field deployments face temperature and voltage variations that are absent in lab conditions. Validate performance across expected environmental ranges before declaring production readiness.
Governance Rule
Set explicit safety margins for RAM, flash, and timing rather than targeting exact limits. Margin policies protect reliability when workloads shift after deployment.
Note: Key Point: Thermal reliability is part of model validation in edge deployments.
fact_check
Budget Worksheet and Gates
Use explicit pass/fail gates before training and before deployment.
Pre-Training Gate
Set target ceilings for RAM, flash, p95 latency, and daily energy before model iteration begins. This prevents teams from optimizing toward models that cannot ever be operationally viable.
Pre-Release Gate
Release only when measured device metrics meet budget with safety margin under realistic traffic and firmware load. A margin policy protects reliability when workloads drift after launch.
Handoff Artifact
Keep versioned budget snapshots linked to model versions so regressions can be detected quickly when new releases are proposed. Review it at each release checkpoint so assumptions remain current.
Note: Key Point: Budget gates convert edge constraints into objective engineering decisions.
error
Budget Overrun Patterns
Most overruns are discovered late because constraints are tracked informally.
Typical Overruns
Frequent overruns include memory fragmentation during long uptime, flash exhaustion after OTA requirements are added, and latency spikes from concurrent tasks. These are system-level issues, not just model issues.
Containment Strategy
Add automated budget regression checks in CI for firmware builds and benchmark harnesses. Catching regressions per commit is far cheaper than debugging integrated failures near release.
Note: Key Point: Budget regressions should be treated like functional regressions in your release process.
rule_settings
Budget Review Rhythm
Regular budget reviews keep decisions grounded as product requirements evolve.
Review Inputs
Review updated model metrics, firmware task loads, and expected traffic scenarios together. Budget accuracy degrades quickly when any one of these changes without cross-team visibility.
Decision Output
Each review should produce either approval, mitigation tasks, or scope reduction decisions. Explicit outcomes keep teams aligned and prevent hidden budget debt from accumulating.
Note: Key Point: Budget governance is a continuous process, not a one-time spreadsheet exercise.