Ch 9 — Accelerators and Delegates

Edge TPU, Core ML, and delegate behavior under real deployment constraints.
Performance
memory_alt
Targets
arrow_forward
build_circle
Compile
arrow_forward
speed
Accelerate
arrow_forward
warning
Fallback
arrow_forward
compare
Portability
-
Click play or press Space to begin the chapter walkthrough...
Step- / 7
developer_board
Accelerator Taxonomy
Different accelerators optimize different parts of the inference path.
Device Classes
Edge accelerators include dedicated ASICs, mobile NPUs, and GPU-backed delegate paths with different operator support and tooling models. Performance claims are meaningful only within the exact hardware and runtime pairing.
Integration Cost
Higher acceleration potential can require stricter graph constraints, compile flows, and release complexity. Integration effort should be evaluated alongside raw speed benefits.
Practical Pattern
Treat accelerator selection as an architecture decision with documented fallback expectations. Clear fallback policy protects user experience when acceleration is unavailable.
Note: Key Point: Accelerator choice is a systems decision, not just a benchmark comparison.
chip_extraction
Edge TPU Constraints
Compiler compatibility and quantization requirements shape model design.
Model Preparation
Edge TPU deployment typically requires compatible quantized model paths and compiler-accepted operator patterns. Architectures that ignore these constraints can fail late in deployment.
Throughput Planning
Even with hardware acceleration, end-to-end throughput depends on preprocessing, IO, and queueing behavior. Measure full pipeline latency rather than isolated inference kernel time.
Failure Pattern
Performance regressions often trace to partial acceleration and hidden fallback on unsupported ops. Profile full graphs, not only accelerated segments.
Note: Key Point: Accelerator-compatible model design must be planned before final architecture lock.
phone_iphone
Core ML and Mobile NPUs
Mobile acceleration paths vary by device generation and runtime policy.
Device Variance
Mobile acceleration behavior can differ across chip generations and OS versions even within one platform family. Maintain device-tier test matrices to avoid release surprises.
Delegate Strategy
Use delegates where they provide stable gains, but keep CPU fallback performance acceptable for unsupported paths. Products should remain functional even when acceleration coverage is partial.
Validation Signal
Measure delegate coverage, fallback frequency, and end-to-end latency under real traffic. Coverage metrics alone can be misleading without workload context.
Note: Key Point: Plan for heterogeneity; mobile acceleration is rarely uniform across the installed base.
priority_high
Fallback Behavior
Fallback paths can dominate latency if not monitored and controlled.
Hidden Cost
A single unsupported operation can trigger frequent fallback to slower execution paths and erase expected gains. Profiling should explicitly identify where and how often fallback occurs.
Mitigation Path
Mitigate fallback by graph rewrites, supported op substitutions, or architecture adjustments validated against quality requirements. Keep mitigation choices documented for future model upgrades.
Governance Rule
Maintain portability baselines and accelerated variants as separate release tracks when hardware diversity is high. Enforcing this consistently prevents scope drift between releases.
Note: Key Point: Fallback observability is required for honest accelerator performance claims.
map
Portability vs Peak Speed
Maximizing one hardware target can reduce portability across the fleet.
Portability Budget
Define whether your product prioritizes one flagship target or broad cross-device coverage before optimization begins. This decision determines acceptable levels of target-specific tuning.
Balanced Approach
A common strategy is portable baseline models plus optional accelerated variants for capable devices. This preserves broad functionality while still capturing top-end performance gains.
Handoff Artifact
Publish target-device support matrices with expected accelerator behavior for each release bundle. Review it at each release checkpoint so assumptions remain current.
Note: Key Point: A tiered deployment strategy often balances portability and performance better than single-target optimization.
warning
Acceleration Illusions
Headline speedups can hide poor end-to-end performance under real workloads.
Illusion Pattern
Microbenchmarks may show large gains while preprocessing, data movement, or fallback paths dominate end-to-end latency. Product decisions should use full-pipeline benchmarks.
Reality Check
Require comparative reports that include both accelerated and fallback scenarios across representative devices. This prevents over-promising performance to product stakeholders.
Note: Key Point: End-to-end measurements are the only reliable basis for acceleration decisions.
done_all
Delegate Release Checklist
Promote acceleration paths only when fallback behavior is operationally acceptable.
Checklist Items
Validate compiler compatibility, delegate coverage, fallback latency, device-tier variance, and rollback options. Each criterion should be linked to measured benchmark evidence.
Deployment Rule
Ship accelerated paths with explicit safeguards for unsupported environments. Stable fallback behavior is mandatory to preserve functionality across the fleet.
Note: Key Point: Delegates should improve performance without making functionality fragile.