Ch 3 — Sensor Data Pipelines for Tiny Devices

Sampling, labeling, drift handling, and deployment-safe dataset design.
Foundation
sensors
Capture
arrow_forward
view_timeline
Window
arrow_forward
label
Label
arrow_forward
tune
Validate
arrow_forward
inventory
Ship
-
Click play or press Space to begin the chapter walkthrough...
Step- / 7
sensors
Sensor Capture Strategy
Data quality starts at collection protocol, not model architecture.
Sampling Policy
Define sample rate, channel count, and capture conditions based on the physical signal characteristics of your task. Over- or under-sampling can erase critical signal patterns before training even begins.
Collection Diversity
Capture across users, devices, environments, and operating states to prevent brittle models. Tiny deployments fail quickly when training data reflects only ideal lab conditions.
Practical Pattern
Create collection protocols that are easy for field teams to follow and audit. Reproducible capture procedures improve data trust and reduce relabeling overhead.
Note: Key Point: Collection diversity is a core reliability lever for TinyML systems.
view_timeline
Windowing and Segmentation
Window length and stride define what the model can learn from signals.
Window Tradeoff
Long windows add context but increase latency and memory pressure; short windows improve responsiveness but may lose semantic cues. Choose window geometry to match both signal physics and response-time requirements.
Stride Effects
Stride controls compute cost and overlap redundancy across consecutive windows. Poor stride settings can either waste power or miss fast transient events.
Failure Pattern
Shortcuts in windowing and labeling often produce attractive offline metrics but unstable field behavior. Evaluation must mirror the true signal timeline.
Note: Key Point: Window and stride decisions are deployment parameters, not just data-prep details.
edit_note
Labeling Discipline
Ambiguous labels create instability that compression cannot fix later.
Label Taxonomy
Use clear class definitions and escalation rules for uncertain samples to reduce annotation noise. Consistent labeling standards improve calibration and reduce false-trigger volatility in production.
Hard Negatives
Actively curate hard negatives that resemble positive events but should not trigger action. These examples are often the fastest path to reducing false alarms on-device.
Validation Signal
Monitor class-wise confusion and false-trigger trends across environment slices rather than relying on aggregate scores. Slice-level metrics reveal hidden weakness early.
Note: Key Point: Hard negatives and label clarity usually matter more than adding random new data volume.
published_with_changes
Drift and Split Strategy
Temporal and environmental drift must be reflected in evaluation splits.
Split Hygiene
Avoid leakage by splitting at session, device, or user level when applicable rather than random sample level. Leakage inflates offline metrics and hides production failure risks.
Drift Monitoring
Track class priors, signal ranges, and feature distributions after deployment to detect gradual drift. A drift trigger should initiate targeted recollection and regression testing.
Governance Rule
Require dataset versioning with lineage to capture conditions and label policy changes. This is essential for reproducibility and post-incident diagnosis.
Note: Key Point: Realistic split design is one of the most important safeguards against false confidence.
task_alt
Dataset Readiness Gates
Dataset readiness should be explicit before model compression and firmware integration.
Readiness Criteria
Define minimum class balance, annotation agreement, edge-case coverage, and acquisition metadata completeness. These criteria create predictable handoffs between data and model engineering stages.
Operational Alignment
Dataset acceptance should map to product failure cost, especially for false positives and false negatives. This keeps model tuning aligned with business and safety outcomes.
Handoff Artifact
Publish a data quality checklist with minimum edge-case coverage before model compression and deployment work begins. Review it at each release checkpoint so assumptions remain current.
Note: Key Point: Dataset gates reduce downstream churn during quantization, integration, and OTA rollout.
find_in_page
Data Issues That Surface in Production
Field noise and drift often reveal problems hidden by curated datasets.
Common Production Gaps
Teams commonly miss low-frequency edge cases, unusual device states, and non-stationary background conditions. These gaps become visible only after rollout unless explicitly included in validation datasets.
Correction Loop
Create a feedback loop that routes flagged production samples into re-annotation and regression testing. Structured feedback prevents repeated failures across release cycles.
Note: Key Point: Production sample feedback should be a first-class part of your data pipeline.
task_alt
Dataset Operations Checklist
Turn data quality expectations into repeatable operational checks.
Checklist Items
Confirm capture diversity, annotation consistency, split hygiene, and edge-case density for each release dataset. Every check should map to a measurable threshold agreed by model and product teams.
Release Gate
Block deployment when high-risk classes or environments are underrepresented. This gate prevents avoidable reliability incidents in constrained edge environments.
Note: Key Point: Dataset operations discipline is a deployment reliability multiplier.