edgecompiler — Instructions¶
A native compiler toolchain for edge AI on macOS Apple Silicon and Google Coral USB.
This document is the main getting-started guide for the edgecompiler repository.
It covers everything from installation through running production inference on
a Coral USB Accelerator connected to a MacBook with an M1 Pro (or any Apple
Silicon Mac).
For the detailed macOS Coral USB setup walkthrough, see
docs/coral_macos_setup.md.
Table of Contents¶
- Overview
- Quick Start
- Installation (macOS M1 Pro)
- CLI Usage
- Python API Usage
- Model Preparation
- Running Classification, Detection, and Benchmark Examples
- Edge TPU Operation Compatibility and CPU Fallback
- Model Size Constraints (8 MB Cache Limit)
- Troubleshooting Common Issues
- Sloth Integration in This Repository
1. Overview¶
edgecompiler replaces the official edgetpu_compiler binary and extends
compilation support to Apple Silicon GPU (Metal / MPS / Neural Engine) alongside
Google Coral USB Accelerator (Edge TPU). The toolchain:
- Ingests models from PyTorch (
.pt/.pth), TensorFlow Lite (.tflite), ONNX (.onnx), and TensorFlow SavedModel directories. - Converts them to a unified intermediate representation (IR) — a directed acyclic graph of typed tensors and operations.
- Applies optimisation passes — constant folding, op fusion, dead code elimination, and layout transformations.
- Quantises the graph to INT8 using Post-Training Quantisation (PTQ), Quantisation-Aware Training (QAT), or dynamic-range quantisation.
- Compiles the quantised IR for the target backend:
- Coral USB (Edge TPU): Produces a
*_edgetpu.tflitefile with embedded custom-op binary segments for the Edge TPU coprocessor. - Apple Silicon (Metal): Produces a
.mlpackageCore ML model or MPSGraph Objective-C++ source for direct GPU/ANE execution. - Runs inference on the compiled model through the
CoralUSBRuntimeorMetalInferenceSessionPython classes.
The entire pipeline runs natively on macOS ARM64 — no Docker, no Rosetta 2, no x86-64 emulation.
2. Quick Start¶
The fastest way to get running with a Coral USB Accelerator:
from edgecompiler.runtime.coral_usb import CoralUSBRuntime
import numpy as np
with CoralUSBRuntime() as rt: # 1. Create runtime
rt.load_model("mobilenet_v1_edgetpu.tflite") # 2. Load compiled model
img = np.zeros((1, 224, 224, 3), dtype=np.uint8) # 3. Prepare input
result = rt.infer(img, top_k=5) # 4. Run inference
print(f"Latency: {result.latency_ms:.1f} ms") # 5. Read results
for cls, score in result.top_classes:
print(f" Class {cls}: {score:.4f}")
CLI equivalent:
# Compile a model for Coral USB
edgecompiler compile model.onnx --target coral --quantize -o model_edgetpu.tflite
# Run inference
edgecompiler run model_edgetpu.tflite --input image.jpg --target coral
# Benchmark
edgecompiler benchmark model_edgetpu.tflite --iterations 100 --target coral
3. Installation (macOS M1 Pro)¶
Step 1: Install edgecompiler¶
# Basic install
pip install edgecompiler
# With Coral runtime extras (TensorFlow Lite interpreter path)
pip install "edgecompiler[coral]"
# With all frontends and backends
pip install "edgecompiler[all]"
# Development install from source
git clone https://github.com/rotsl/edgecompiler.git
cd edgecompiler
make install # pip install -e ".[dev]"
edgecompiler[coral] installs Python-side Coral runtime extras. You still need
the Edge TPU shared library (libedgetpu) from Step 2 below.
Step 2: Install the Coral Edge TPU runtime¶
The libedgetpu shared library is required to communicate with the Coral USB
Accelerator. Google does not ship official ARM64 macOS builds, so use the
community-maintained build or the helper script:
# Option A: Use the helper script (recommended)
bash scripts/install_coral_runtime.sh
# Option B: Build from source for ARM64
git clone https://github.com/feranick/libedgetpu.git
cd libedgetpu
CPU=darwin_arm64 make
sudo cp out/darwin_arm64/libedgetpu.1.dylib /usr/local/lib/
sudo ln -sf /usr/local/lib/libedgetpu.1.dylib /usr/local/lib/libedgetpu.dylib
Step 3: Set the library path¶
Step 4: Verify¶
python3 -c "
from edgecompiler.runtime.coral_usb import CoralUSBRuntime
rt = CoralUSBRuntime()
print(f'libedgetpu: {rt._lib_available}')
devices = rt.detect_devices()
print(f'Devices: {devices}')
"
Step 5: Download hardware test models¶
Step 6: Review benchmark report¶
See benchmarks.md for the latest with/without hardware test-runner results.
For the complete step-by-step setup guide with troubleshooting, see
docs/coral_macos_setup.md.
4. CLI Usage¶
The edgecompiler CLI (invoked as edgecompiler or edgecompile) provides
several subcommands:
compile¶
Compile a model for a target backend:
| Option | Default | Description |
|---|---|---|
--target, -t |
(required) | Target backend: coral or metal |
--quantize, -q |
False | Apply post-training INT8 quantisation |
--output, -o |
Auto-named | Output path for the compiled model |
--calibration-data |
None | Path to calibration data (.npy, .npz, or directory) |
--per-channel |
True | Use per-channel (per-axis) weight quantisation |
--symmetric |
True | Use symmetric quantisation |
--input-shape |
None | Input shape for PyTorch models (e.g., 1,3,224,224) |
--simplify |
True | Apply onnx-simplifier for ONNX models |
--min-runtime-version |
14 | Minimum Edge TPU runtime version (Coral only) |
--verbose, -v |
0 | Increase verbosity (-v for INFO, -vv for DEBUG) |
Examples:
# ONNX → Coral with PTQ
edgecompiler compile model.onnx --target coral --quantize --calibration-data calib.npy
# TFLite → Metal
edgecompiler compile model.tflite --target metal -o model.mlpackage
# PyTorch → Coral (must specify input shape)
edgecompiler compile model.pt --target coral --input-shape 1,3,224,224 --quantize
quantize¶
Apply a quantisation strategy without compiling:
| Option | Default | Description |
|---|---|---|
--mode, -m |
ptq |
Quantisation mode: ptq, qat, or dynamic |
--calibration-data |
None | Calibration data for PTQ |
--per-channel |
True | Per-channel weight quantisation |
--symmetric |
True | Symmetric quantisation |
--output, -o |
Auto-named | Output path for the quantised IR JSON |
convert¶
Convert a model to IR JSON without quantising or compiling:
inspect¶
Inspect a model's structure, operations, and quantisation info:
run¶
Run a single inference on a compiled model:
| Option | Description |
|---|---|
--input |
Path to input data: .npy, .npz, or image file (.jpg, .png) |
--target |
Runtime target: coral or metal (auto-detected from filename) |
--output |
Save output tensors to .npz file |
benchmark¶
Benchmark a compiled model and report latency statistics:
| Option | Default | Description |
|---|---|---|
--iterations |
100 | Number of inference iterations |
--target |
Auto-detected | Runtime target |
--input |
Random data | Input data (.npy, .npz, or image) |
5. Python API Usage¶
CoralUSBRuntime¶
The CoralUSBRuntime class provides a high-level Python API for Coral USB
inference:
from edgecompiler.runtime.coral_usb import CoralUSBRuntime
import numpy as np
# Create runtime (auto-detects libedgetpu)
runtime = CoralUSBRuntime()
# Or specify the library path explicitly
runtime = CoralUSBRuntime(libedgetpu_path="/usr/local/lib/libedgetpu.1.dylib")
# Detect devices
devices = runtime.detect_devices()
# → [CoralDevice(path=':0', type='USB', name='Coral USB Accelerator')]
# Load a compiled Edge TPU model
runtime.load_model("model_edgetpu.tflite", device=":0")
# Get expected input size
h, w = runtime.get_input_size() # e.g., (224, 224)
# Run inference
input_data = np.zeros((1, 224, 224, 3), dtype=np.uint8)
result = runtime.infer(input_data, top_k=5)
# Access results
print(result.output) # Primary output tensor (numpy.ndarray)
print(result.all_outputs) # Dict of all output tensors
print(result.latency_ms) # Inference latency in milliseconds
print(result.top_classes) # List of (class_id, score) tuples
# Run classification with labels
with open("imagenet_labels.txt") as f:
labels = [line.strip() for line in f]
labelled = runtime.classify(input_data, top_k=5, labels=labels)
# → [("great grey owl", 0.92), ("owl", 0.04), ...]
# Benchmark
stats = runtime.benchmark(num_runs=100, warmup_runs=5)
print(f"Mean: {stats['mean_latency_ms']:.2f} ms")
print(f"P95: {stats['p95_latency_ms']:.2f} ms")
print(f"FPS: {stats['throughput_fps']:.1f}")
# Clean up
runtime.close()
# Or use as a context manager
with CoralUSBRuntime() as rt:
rt.load_model("model_edgetpu.tflite")
result = rt.infer(input_data)
Programmatic compilation¶
from edgecompiler import compile
# Compile for Coral USB
result = compile(
"mobilenet_v2.onnx",
target="coral",
quantize="ptq",
calibration_data="calibration_images.npy",
output="mobilenet_v2_coral.tflite",
)
print(result)
# CompileResult(output_path='mobilenet_v2_coral.tflite', ops_on_target=140, ops_fallback=12)
# Compile for Metal
result = compile(
"mobilenet_v2.tflite",
target="metal",
output="mobilenet_v2.mlpackage",
)
6. Model Preparation¶
Requirements for Edge TPU compilation¶
For a model to execute on the Edge TPU, it must satisfy these conditions:
-
INT8 quantised. The Edge TPU only supports INT8 inference. Use
edgecompiler quantizeor pass--quantizetoedgecompiler compile. -
Fully-compiled for Edge TPU. A plain quantised TFLite model (
model_quant.tflite) will not use the Edge TPU. It must be compiled into a*_edgetpu.tflitefile that embeds the Edge TPU binary segments. -
Compatible operations. Only Edge TPU-supported operations will be mapped to the accelerator. Unsupported operations fall back to CPU.
Quantisation workflow¶
# Step 1: Convert model to IR (auto-detected format)
edgecompiler convert model.onnx -o model_ir.json
# Step 2: Quantise to INT8 with calibration data
edgecompiler quantize model.onnx --mode ptq \
--calibration-data calibration_samples/ \
-o model_quant_ir.json
# Step 3: Compile for Edge TPU
edgecompiler compile model_quant_ir.json --target coral -o model_edgetpu.tflite
Or as a single command:
edgecompiler compile model.onnx --target coral --quantize \
--calibration-data calibration_samples/ \
-o model_edgetpu.tflite
Preparing calibration data¶
Calibration data should be representative of your inference inputs. Collect
100–500 samples and save them as .npy files:
import numpy as np
# Collect calibration images (preprocessed to model input shape)
samples = []
for image_path in training_image_paths[:200]:
img = preprocess_image(image_path) # → np.ndarray, shape (1, 224, 224, 3)
samples.append(img)
# Save as a directory of .npy files
import os
os.makedirs("calibration_samples", exist_ok=True)
for i, sample in enumerate(samples):
np.save(f"calibration_samples/sample_{i:04d}.npy", sample)
# Or as a single .npz file
np.savez("calibration.npz", *[s for s in samples])
Using pre-quantised models¶
Google provides a set of pre-quantised and pre-compiled Edge TPU models in the Coral model zoo. Download them with the included script:
This downloads MobileNetV1, MobileNetV2, EfficientNet-EdgeTPU-S, and SSD MobileNetV2 COCO — all ready to run on the Edge TPU without further compilation.
7. Running Classification, Detection, and Benchmark Examples¶
The examples/ directory contains ready-to-run scripts for common workflows.
Image classification¶
# Classify an image using MobileNetV1 on Coral USB
python examples/coral_usb_classify.py \
--model models/mobilenet_v1_1.0_224_quant_edgetpu.tflite \
--labels models/imagenet_labels.txt \
--input models/parrot.jpg
Example output:
---- Classification Results ----
1. Ara macaw (Scarlet Macaw): 0.8594
2. African grey: 0.0469
3. macaw: 0.0312
4. lorikeet: 0.0156
5. toucan: 0.0078
Inference latency: 3.2 ms (Edge TPU)
Object detection¶
# Detect objects using SSDLite-MobileNetV2 on Coral USB
python examples/coral_usb_detect.py \
--model models/ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite \
--labels models/coco_labels.txt \
--input test_image.jpg \
--output detection_result.jpg
Benchmarking¶
# Benchmark a model on Coral USB
python examples/coral_usb_benchmark.py \
--model models/mobilenet_v2_1.0_224_quant_edgetpu.tflite \
--iterations 200
# Or using the CLI
edgecompiler benchmark models/mobilenet_v2_1.0_224_quant_edgetpu.tflite \
--iterations 200 --target coral
Typical benchmark results on M1 Pro with Coral USB 3.0:
| Model | Mean (ms) | P95 (ms) | P99 (ms) | FPS |
|---|---|---|---|---|
| MobileNetV1 INT8 | 2.8 | 3.5 | 4.1 | 357 |
| MobileNetV2 INT8 | 3.1 | 3.9 | 4.6 | 323 |
| EfficientNet-EdgeTPU-S INT8 | 4.5 | 5.8 | 6.7 | 222 |
| SSD MobileNetV2 COCO INT8 | 6.8 | 8.9 | 10.2 | 147 |
Full demo¶
One-command test runner (auto hardware detection)¶
Use the built-in test runner to separate simulation and hardware suites automatically:
# Auto mode: always runs simulation tests; runs hardware tests only if Coral is detected
edge-test
# Same behavior via Makefile
make test-auto
# Force simulation-only or hardware-only runs
edge-test --mode simulation
edge-test --mode hardware
8. Edge TPU Operation Compatibility and CPU Fallback¶
The Edge TPU supports a specific set of INT8 operations. When a model contains operations that the Edge TPU cannot execute, those operations are automatically routed to the CPU.
Supported operations (Edge TPU INT8)¶
| Operation | Supported | Notes |
|---|---|---|
| Conv2D | ✅ | Standard + depthwise |
| DepthwiseConv2D | ✅ | |
| FullyConnected | ✅ | |
| MaxPool2D | ✅ | |
| AveragePool2D | ✅ | |
| ReLU / ReLU6 / ReLUN1To1 | ✅ | |
| Softmax | ✅ | |
| Sigmoid / Tanh | ✅ | |
| Add / Sub / Mul | ✅ | Element-wise |
| Concatenation | ✅ | |
| Reshape / Transpose | ✅ | |
| Pad | ✅ | |
| ReduceMin / ReduceMax / Mean | ✅ | |
| ExpandDims / Squeeze | ✅ | |
| Split / Slice | ✅ | |
| ResizeBilinear / ResizeNearestNeighbor | ✅ | |
| Logistic / L2Normalization | ✅ | |
| BatchToSpaceND / SpaceToBatchND | ✅ | |
| Gather | ⚠️ | Fallback on some dimensions |
| StridedSlice | ⚠️ | Limited mask support |
| LSTM | ❌ | Falls back to CPU |
| Einsum | ❌ | Not supported |
| ScatterND | ❌ | Not supported |
How CPU fallback works¶
When the compiler partitions the model, it creates clusters of consecutive TPU-compatible operations. At the boundaries between TPU and CPU clusters, dequantise/quantise operations are inserted to convert between INT8 (Edge TPU) and float32 (CPU) tensor formats:
TPU Cluster 0: Conv2D → BN → ReLU
↓ (Dequantize: INT8 → FP32)
CPU: Reshape (dynamic shape)
↓ (Quantize: FP32 → INT8)
TPU Cluster 1: Conv2D → ReLU → Pool
↓ (Dequantize: INT8 → FP32)
CPU: Softmax
Each TPU↔CPU transition adds approximately 0.1 ms of overhead. For best performance, minimise the number of transitions by choosing models that are fully compatible with the Edge TPU (e.g., MobileNet-family models).
Checking operation mapping¶
After compilation, use edgecompiler inspect to see which operations are
mapped to the Edge TPU and which fall back to CPU:
The output shows the number of TPU-mapped operations vs. CPU-fallback operations, along with a breakdown by operation type.
9. Model Size Constraints (8 MB Cache Limit)¶
The Coral USB Accelerator's Edge TPU coprocessor has approximately 8 MB of on-chip cache for parameter data (weights and biases). This is a hard limit for models that need to execute entirely on the Edge TPU without off-chip memory access.
Practical implications¶
| Model Category | Parameter Size | Fits in Cache? | Performance |
|---|---|---|---|
| MobileNetV1 1.0 224 | ~4.2 MB | ✅ Yes | Full-speed TPU execution |
| MobileNetV2 1.0 224 | ~3.5 MB | ✅ Yes | Full-speed TPU execution |
| EfficientNet-EdgeTPU-S | ~6.8 MB | ✅ Yes | Full-speed TPU execution |
| SSD MobileNetV2 COCO | ~6.6 MB | ✅ Yes | Full-speed TPU execution |
| EfficientNet-Lite2 | ~8.5 MB | ⚠️ Marginal | Some off-chip spills |
| EfficientNet-Lite4 | ~13 MB | ❌ No | Significant off-chip access |
| ResNet50 | ~25 MB | ❌ No | Most weights off-chip |
When a model exceeds the cache¶
If the model's parameter data exceeds 8 MB, the Edge TPU runtime automatically spills excess data to off-chip DRAM. This works correctly but incurs a significant performance penalty (2–5× slower for the spilled layers).
To check the model size before compilation:
Strategies for staying within the cache limit¶
-
Use efficient architectures. MobileNet, EfficientNet-Lite, and SSDLite variants are designed to be small and Edge TPU-friendly.
-
Reduce the input resolution. A 192×192 input produces smaller intermediate activations than 224×224, allowing the compiler to keep more of the model on-chip.
-
Reduce the depth multiplier. For MobileNet, use a depth multiplier of 0.5 or 0.35 instead of 1.0.
-
Prune the model. Structured pruning can remove entire channels, reducing the parameter count.
-
Split the model. If only some layers exceed the cache, split the model into sub-models and execute them sequentially.
10. Troubleshooting Common Issues¶
"No Coral USB device detected"¶
- Check the physical connection — the LED on the device should be lit.
- Verify with
system_profiler SPUSBDataTypeon macOS orlsusbon Linux. - Avoid USB-C hubs; use a direct USB-A port with an adapter.
- Re-plug the device after installing
libedgetpu.
"libedgetpu not found"¶
- Verify the dylib exists:
ls /usr/local/lib/libedgetpu*.dylib - Set
DYLD_LIBRARY_PATH="/usr/local/lib:${DYLD_LIBRARY_PATH:-}" - Reinstall:
bash scripts/install_coral_runtime.sh --force
"Architecture mismatch (x86_64 dylib on ARM64)"¶
- Check the architecture:
file /usr/local/lib/libedgetpu.1.dylib - If it says
x86_64, build the ARM64 version fromferanick/libedgetpu(see Section 3).
"Model not fully mapped to Edge TPU"¶
- Ensure the model file is named
*_edgetpu.tflite(compiled for Edge TPU). - Run
edgecompiler inspect model_edgetpu.tfliteto see which ops fall back to CPU. - Avoid unsupported ops (LSTM, Einsum, ScatterND) in your model.
High latency or unexpected performance¶
- USB 2.0 port: The Coral needs USB 3.0 for full throughput. Check that the device is connected to a USB 3.0 port (5 Gbps).
- CPU fallback: Check if ops are falling back to CPU (see above).
- Thermal throttling: The Edge TPU may throttle under sustained load. Ensure adequate ventilation.
- Model too large: If the model exceeds the 8 MB cache, off-chip spills slow down inference significantly.
"Permission denied on USB device" (Linux)¶
Or add a udev rule:
sudo bash -c 'echo "SUBSYSTEM==\"usb\", ATTR{idVendor}==\"18d1\", ATTR{idProduct}==\"9302\", MODE=\"0666\"" > /etc/udev/rules.d/99-edgetpu-accelerator.rules'
sudo udevadm control --reload-rules
Python import errors¶
If from edgecompiler.runtime.coral_usb import CoralUSBRuntime fails:
- Ensure
edgecompileris installed:pip install edgecompiler - Ensure
tensorflowis installed (or installedgecompiler[coral]) - Ensure
numpyis installed:pip install numpy - Check that Python is ARM64-native:
python3 -c "import platform; print(platform.machine())"→arm64
Getting help¶
- Detailed setup guide:
docs/coral_macos_setup.md - Architecture documentation:
docs/architecture.md - Coral backend internals:
docs/coral_backend.md - Examples:
docs/examples.md - Issues: GitHub Issues
11. Sloth Integration in This Repository¶
edgecompiler now includes sloth_integration in the same monorepo under
sloth-integration/src/sloth_integration.
Use this when you need an end-to-end text workflow:
- Fine-tune small language models with unsloth-compatible adapters
- Export to ONNX/TFLite through
SlothConverter - Quantize and compile with
edgecompiler - Run Coral USB inference via
SlothCoralRuntime
Quick commands:
# Install root project with dev dependencies
pip install -e ".[dev]"
# Run sloth integration tests
pytest sloth-integration/tests -v
# Run sloth benchmark path
python sloth-integration/examples/benchmark_coral.py \
--model sloth-integration/test_models/synthetic_text_classifier.tflite \
--iterations 200 \
--warmup 20
Reference docs:
sloth-integration/instructions_sloth_integration.mdsloth-integration/docs/benchmarks_sloth.md