Ch 7 — ExecuTorch and ONNX Runtime Mobile

PyTorch and ONNX edge paths for mobile and embedded Linux targets.
Runtime
hub
Choose
arrow_forward
model_training
Export
arrow_forward
inventory_2
Package
arrow_forward
speed
Optimize
arrow_forward
deployed_code
Ship
-
Click play or press Space to begin the chapter walkthrough...
Step- / 7
alt_route
Runtime Selection Criteria
Runtime choice should reflect platform, model source, and operator profile.
Selection Inputs
Consider source framework, target OS, hardware acceleration options, binary size limits, and observability requirements. This turns runtime selection into a measurable decision rather than ecosystem preference.
Compatibility First
A runtime that nominally supports your platform can still fail due to operator or delegate gaps. Validate compatibility with production-like graphs before committing roadmap timelines.
Practical Pattern
Define a runtime selection scorecard with platform coverage, operator fit, startup cost, and maintenance burden. Structured scoring reduces tooling churn.
Note: Key Point: Runtime fit is determined by graph compatibility and ops behavior, not branding.
code_blocks
ExecuTorch Path
ExecuTorch is designed for edge deployment flows from PyTorch ecosystems.
Export Discipline
Stabilize model interfaces and export process so successive model updates remain integration-compatible. Keep export configuration under version control alongside model and eval artifacts.
Edge Integration
Treat runtime integration as part of application engineering with test harnesses and rollout policies. This reduces surprises when moving from local validation to user-facing mobile environments.
Failure Pattern
Mobile regressions often come from hidden fallback paths and device-class heterogeneity. Validate behavior across representative device tiers before release.
Note: Key Point: A repeatable export contract is the backbone of maintainable edge deployment.
device_hub
ONNX Runtime Mobile Path
ONNX Runtime Mobile supports compact deployment profiles for on-device inference.
Packaging Strategy
Build only required operators and model assets to control application size and startup cost. Packaging discipline is essential when mobile release budgets are strict.
Cross-Platform Benefit
ONNX-based flows can simplify multi-platform deployment when interface parity is maintained across targets. A shared validation suite keeps behavior consistent across OS variants.
Validation Signal
Track startup time, memory usage, and inference latency by runtime and device class in the same report format. Consistent reporting improves decision clarity.
Note: Key Point: Slim packaging and shared validation are core strengths of well-managed ONNX mobile workflows.
warning
Operator and Delegate Risk
Unsupported operators and delegate fallbacks are common sources of regressions.
Risk Pattern
A model may pass functional tests but still regress latency if key operations fall back to slower paths. Include delegate and fallback visibility in performance test reports.
Mitigation
Prefer architecture or graph adjustments that preserve accelerated paths instead of patching around fallback behavior late. This usually yields better long-term maintainability and predictability.
Governance Rule
Bundle runtime and model versions together in release records to avoid compatibility ambiguity during incident response. Enforcing this consistently prevents scope drift between releases.
Note: Key Point: Monitor fallback behavior explicitly; hidden fallback can invalidate performance assumptions.
rocket_launch
Release Strategy for Mobile Edge
Use staged rollout and observability to control real-world variance.
Staged Promotion
Promote new model-runtime bundles through canary cohorts before broad rollout to catch device-specific regressions. Rollback plans should be tested, not just documented.
Operational Telemetry
Track inference latency, failure counts, model version distribution, and resource usage at runtime. This telemetry closes the loop between model teams and mobile engineering.
Handoff Artifact
Keep platform-specific runbooks for fallback diagnostics and performance triage as part of release documentation. Review it at each release checkpoint so assumptions remain current.
Note: Key Point: Mobile edge deployment success depends on release engineering as much as model quality.
mobile_off
Mobile Deployment Pitfalls
Cross-device variance can invalidate assumptions built on one test phone.
Pitfall Examples
Frequent pitfalls include unsupported delegates on older devices, binary size creep from broad operator inclusion, and startup regressions after runtime upgrades. These issues can pass small internal tests and fail at scale.
Mitigation Pattern
Use tiered device matrices, staged rollout cohorts, and fallback observability in production telemetry. This reduces broad-impact regressions and enables targeted hotfixes.
Note: Key Point: Device-tier testing and phased rollout are essential for stable mobile edge deployment.
check_circle
Runtime Promotion Checklist
Promote runtime-model bundles using explicit criteria.
Checklist Items
Confirm operator compatibility, delegate behavior, startup budget, runtime memory budget, and p95 latency on each supported device tier. Include rollback feasibility in every approval decision.
Go-Live Gate
Require production telemetry hooks to be live before broad rollout so regressions can be detected quickly. Monitoring readiness is a deployment requirement, not a post-launch task.
Note: Key Point: Runtime promotion should be evidence-based and telemetry-enabled from day one.