Ch 3: Sensor Data Pipelines for Tiny Devices

Ch 3 — Sensor Data Pipelines for Tiny Devices

Sampling, labeling, drift handling, and deployment-safe dataset design.

Index ← Prev Next →

Foundation

sensors

Capture

arrow_forward

view_timeline

Window

arrow_forward

label

Label

arrow_forward

tune

Validate

arrow_forward

inventory

Ship

Click play or press Space to begin the chapter walkthrough...

Step- / 7

sensors

Sensor Capture Strategy

Data quality starts at collection protocol, not model architecture.

Sampling Policy

Define sample rate, channel count, and capture conditions based on the physical signal characteristics of your task. Over- or under-sampling can erase critical signal patterns before training even begins.

Collection Diversity

Capture across users, devices, environments, and operating states to prevent brittle models. Tiny deployments fail quickly when training data reflects only ideal lab conditions.

Practical Pattern

Create collection protocols that are easy for field teams to follow and audit. Reproducible capture procedures improve data trust and reduce relabeling overhead.

Note: Key Point: Collection diversity is a core reliability lever for TinyML systems.

view_timeline

Windowing and Segmentation

Window length and stride define what the model can learn from signals.

Window Tradeoff

Long windows add context but increase latency and memory pressure; short windows improve responsiveness but may lose semantic cues. Choose window geometry to match both signal physics and response-time requirements.

Stride Effects

Stride controls compute cost and overlap redundancy across consecutive windows. Poor stride settings can either waste power or miss fast transient events.

Failure Pattern

Shortcuts in windowing and labeling often produce attractive offline metrics but unstable field behavior. Evaluation must mirror the true signal timeline.

Note: Key Point: Window and stride decisions are deployment parameters, not just data-prep details.

edit_note

Labeling Discipline

Ambiguous labels create instability that compression cannot fix later.

Label Taxonomy

Use clear class definitions and escalation rules for uncertain samples to reduce annotation noise. Consistent labeling standards improve calibration and reduce false-trigger volatility in production.

Hard Negatives

Actively curate hard negatives that resemble positive events but should not trigger action. These examples are often the fastest path to reducing false alarms on-device.

Validation Signal

Monitor class-wise confusion and false-trigger trends across environment slices rather than relying on aggregate scores. Slice-level metrics reveal hidden weakness early.

Note: Key Point: Hard negatives and label clarity usually matter more than adding random new data volume.

published_with_changes

Drift and Split Strategy

Temporal and environmental drift must be reflected in evaluation splits.

Split Hygiene

Avoid leakage by splitting at session, device, or user level when applicable rather than random sample level. Leakage inflates offline metrics and hides production failure risks.

Drift Monitoring

Track class priors, signal ranges, and feature distributions after deployment to detect gradual drift. A drift trigger should initiate targeted recollection and regression testing.

Governance Rule

Require dataset versioning with lineage to capture conditions and label policy changes. This is essential for reproducibility and post-incident diagnosis.

Note: Key Point: Realistic split design is one of the most important safeguards against false confidence.

task_alt

Dataset Readiness Gates

Dataset readiness should be explicit before model compression and firmware integration.

Readiness Criteria

Define minimum class balance, annotation agreement, edge-case coverage, and acquisition metadata completeness. These criteria create predictable handoffs between data and model engineering stages.

Operational Alignment

Dataset acceptance should map to product failure cost, especially for false positives and false negatives. This keeps model tuning aligned with business and safety outcomes.

Handoff Artifact

Publish a data quality checklist with minimum edge-case coverage before model compression and deployment work begins. Review it at each release checkpoint so assumptions remain current.

Note: Key Point: Dataset gates reduce downstream churn during quantization, integration, and OTA rollout.

find_in_page

Data Issues That Surface in Production

Field noise and drift often reveal problems hidden by curated datasets.

Common Production Gaps

Teams commonly miss low-frequency edge cases, unusual device states, and non-stationary background conditions. These gaps become visible only after rollout unless explicitly included in validation datasets.

Correction Loop

Create a feedback loop that routes flagged production samples into re-annotation and regression testing. Structured feedback prevents repeated failures across release cycles.

Note: Key Point: Production sample feedback should be a first-class part of your data pipeline.

task_alt

Dataset Operations Checklist

Turn data quality expectations into repeatable operational checks.

Checklist Items

Confirm capture diversity, annotation consistency, split hygiene, and edge-case density for each release dataset. Every check should map to a measurable threshold agreed by model and product teams.

Release Gate

Block deployment when high-risk classes or environments are underrepresented. This gate prevents avoidable reliability incidents in constrained edge environments.

Note: Key Point: Dataset operations discipline is a deployment reliability multiplier.