edgecompiler — Instructions¶

A native compiler toolchain for edge AI on macOS Apple Silicon and Google Coral USB.

This document is the main getting-started guide for the edgecompiler repository. It covers everything from installation through running production inference on a Coral USB Accelerator connected to a MacBook with an M1 Pro (or any Apple Silicon Mac).

For the detailed macOS Coral USB setup walkthrough, see docs/coral_macos_setup.md.

Table of Contents¶

Overview
Quick Start
Installation (macOS M1 Pro)
CLI Usage
Python API Usage
Model Preparation
Running Classification, Detection, and Benchmark Examples
Edge TPU Operation Compatibility and CPU Fallback
Model Size Constraints (8 MB Cache Limit)
Troubleshooting Common Issues
Sloth Integration in This Repository

1. Overview¶

edgecompiler replaces the official edgetpu_compiler binary and extends compilation support to Apple Silicon GPU (Metal / MPS / Neural Engine) alongside Google Coral USB Accelerator (Edge TPU). The toolchain:

Ingests models from PyTorch (.pt/.pth), TensorFlow Lite (.tflite), ONNX (.onnx), and TensorFlow SavedModel directories.
Converts them to a unified intermediate representation (IR) — a directed acyclic graph of typed tensors and operations.
Applies optimisation passes — constant folding, op fusion, dead code elimination, and layout transformations.
Quantises the graph to INT8 using Post-Training Quantisation (PTQ), Quantisation-Aware Training (QAT), or dynamic-range quantisation.
Compiles the quantised IR for the target backend:
Coral USB (Edge TPU): Produces a *_edgetpu.tflite file with embedded custom-op binary segments for the Edge TPU coprocessor.
Apple Silicon (Metal): Produces a .mlpackage Core ML model or MPSGraph Objective-C++ source for direct GPU/ANE execution.
Runs inference on the compiled model through the CoralUSBRuntime or MetalInferenceSession Python classes.

The entire pipeline runs natively on macOS ARM64 — no Docker, no Rosetta 2, no x86-64 emulation.

2. Quick Start¶

The fastest way to get running with a Coral USB Accelerator:

from edgecompiler.runtime.coral_usb import CoralUSBRuntime
import numpy as np

with CoralUSBRuntime() as rt:                          # 1. Create runtime
    rt.load_model("mobilenet_v1_edgetpu.tflite")       # 2. Load compiled model
    img = np.zeros((1, 224, 224, 3), dtype=np.uint8)   # 3. Prepare input
    result = rt.infer(img, top_k=5)                     # 4. Run inference
    print(f"Latency: {result.latency_ms:.1f} ms")       # 5. Read results
    for cls, score in result.top_classes:
        print(f"  Class {cls}: {score:.4f}")

CLI equivalent:

# Compile a model for Coral USB
edgecompiler compile model.onnx --target coral --quantize -o model_edgetpu.tflite

# Run inference
edgecompiler run model_edgetpu.tflite --input image.jpg --target coral

# Benchmark
edgecompiler benchmark model_edgetpu.tflite --iterations 100 --target coral

3. Installation (macOS M1 Pro)¶

Step 1: Install edgecompiler¶

# Basic install
pip install edgecompiler

# With Coral runtime extras (TensorFlow Lite interpreter path)
pip install "edgecompiler[coral]"

# With all frontends and backends
pip install "edgecompiler[all]"

# Development install from source
git clone https://github.com/rotsl/edgecompiler.git
cd edgecompiler
make install   # pip install -e ".[dev]"

edgecompiler[coral] installs Python-side Coral runtime extras. You still need the Edge TPU shared library (libedgetpu) from Step 2 below.

Step 2: Install the Coral Edge TPU runtime¶

The libedgetpu shared library is required to communicate with the Coral USB Accelerator. Google does not ship official ARM64 macOS builds, so use the community-maintained build or the helper script:

# Option A: Use the helper script (recommended)
bash scripts/install_coral_runtime.sh

# Option B: Build from source for ARM64
git clone https://github.com/feranick/libedgetpu.git
cd libedgetpu
CPU=darwin_arm64 make
sudo cp out/darwin_arm64/libedgetpu.1.dylib /usr/local/lib/
sudo ln -sf /usr/local/lib/libedgetpu.1.dylib /usr/local/lib/libedgetpu.dylib

Step 3: Set the library path¶

echo 'export DYLD_LIBRARY_PATH="/usr/local/lib:${DYLD_LIBRARY_PATH:-}"' >> ~/.zshrc
source ~/.zshrc

Step 4: Verify¶

python3 -c "
from edgecompiler.runtime.coral_usb import CoralUSBRuntime
rt = CoralUSBRuntime()
print(f'libedgetpu: {rt._lib_available}')
devices = rt.detect_devices()
print(f'Devices: {devices}')
"

Step 5: Download hardware test models¶

bash scripts/download_models.sh --output-dir tests/hardware/test_models

Step 6: Review benchmark report¶

See benchmarks.md for the latest with/without hardware test-runner results.

For the complete step-by-step setup guide with troubleshooting, see docs/coral_macos_setup.md.

4. CLI Usage¶

The edgecompiler CLI (invoked as edgecompiler or edgecompile) provides several subcommands:

compile¶

Compile a model for a target backend:

edgecompiler compile MODEL --target {coral,metal} [OPTIONS]

Option	Default	Description
`--target`, `-t`	(required)	Target backend: `coral` or `metal`
`--quantize`, `-q`	False	Apply post-training INT8 quantisation
`--output`, `-o`	Auto-named	Output path for the compiled model
`--calibration-data`	None	Path to calibration data (.npy, .npz, or directory)
`--per-channel`	True	Use per-channel (per-axis) weight quantisation
`--symmetric`	True	Use symmetric quantisation
`--input-shape`	None	Input shape for PyTorch models (e.g., `1,3,224,224`)
`--simplify`	True	Apply onnx-simplifier for ONNX models
`--min-runtime-version`	14	Minimum Edge TPU runtime version (Coral only)
`--verbose`, `-v`	0	Increase verbosity (`-v` for INFO, `-vv` for DEBUG)

Examples:

# ONNX → Coral with PTQ
edgecompiler compile model.onnx --target coral --quantize --calibration-data calib.npy

# TFLite → Metal
edgecompiler compile model.tflite --target metal -o model.mlpackage

# PyTorch → Coral (must specify input shape)
edgecompiler compile model.pt --target coral --input-shape 1,3,224,224 --quantize

quantize¶

Apply a quantisation strategy without compiling:

edgecompiler quantize MODEL --mode {ptq,qat,dynamic} [OPTIONS]

Option	Default	Description
`--mode`, `-m`	`ptq`	Quantisation mode: `ptq`, `qat`, or `dynamic`
`--calibration-data`	None	Calibration data for PTQ
`--per-channel`	True	Per-channel weight quantisation
`--symmetric`	True	Symmetric quantisation
`--output`, `-o`	Auto-named	Output path for the quantised IR JSON

convert¶

Convert a model to IR JSON without quantising or compiling:

edgecompiler convert MODEL --output model_ir.json

inspect¶

Inspect a model's structure, operations, and quantisation info:

edgecompiler inspect MODEL

run¶

Run a single inference on a compiled model:

edgecompiler run MODEL --input INPUT --target {coral,metal} [OPTIONS]

Option	Description
`--input`	Path to input data: `.npy`, `.npz`, or image file (`.jpg`, `.png`)
`--target`	Runtime target: `coral` or `metal` (auto-detected from filename)
`--output`	Save output tensors to `.npz` file

benchmark¶

Benchmark a compiled model and report latency statistics:

edgecompiler benchmark MODEL --iterations N --target {coral,metal}

Option	Default	Description
`--iterations`	100	Number of inference iterations
`--target`	Auto-detected	Runtime target
`--input`	Random data	Input data (`.npy`, `.npz`, or image)

5. Python API Usage¶

CoralUSBRuntime¶

The CoralUSBRuntime class provides a high-level Python API for Coral USB inference:

from edgecompiler.runtime.coral_usb import CoralUSBRuntime
import numpy as np

# Create runtime (auto-detects libedgetpu)
runtime = CoralUSBRuntime()

# Or specify the library path explicitly
runtime = CoralUSBRuntime(libedgetpu_path="/usr/local/lib/libedgetpu.1.dylib")

# Detect devices
devices = runtime.detect_devices()
# → [CoralDevice(path=':0', type='USB', name='Coral USB Accelerator')]

# Load a compiled Edge TPU model
runtime.load_model("model_edgetpu.tflite", device=":0")

# Get expected input size
h, w = runtime.get_input_size()  # e.g., (224, 224)

# Run inference
input_data = np.zeros((1, 224, 224, 3), dtype=np.uint8)
result = runtime.infer(input_data, top_k=5)

# Access results
print(result.output)         # Primary output tensor (numpy.ndarray)
print(result.all_outputs)    # Dict of all output tensors
print(result.latency_ms)     # Inference latency in milliseconds
print(result.top_classes)    # List of (class_id, score) tuples

# Run classification with labels
with open("imagenet_labels.txt") as f:
    labels = [line.strip() for line in f]
labelled = runtime.classify(input_data, top_k=5, labels=labels)
# → [("great grey owl", 0.92), ("owl", 0.04), ...]

# Benchmark
stats = runtime.benchmark(num_runs=100, warmup_runs=5)
print(f"Mean: {stats['mean_latency_ms']:.2f} ms")
print(f"P95:  {stats['p95_latency_ms']:.2f} ms")
print(f"FPS:  {stats['throughput_fps']:.1f}")

# Clean up
runtime.close()

# Or use as a context manager
with CoralUSBRuntime() as rt:
    rt.load_model("model_edgetpu.tflite")
    result = rt.infer(input_data)

Programmatic compilation¶

from edgecompiler import compile

# Compile for Coral USB
result = compile(
    "mobilenet_v2.onnx",
    target="coral",
    quantize="ptq",
    calibration_data="calibration_images.npy",
    output="mobilenet_v2_coral.tflite",
)
print(result)
# CompileResult(output_path='mobilenet_v2_coral.tflite', ops_on_target=140, ops_fallback=12)

# Compile for Metal
result = compile(
    "mobilenet_v2.tflite",
    target="metal",
    output="mobilenet_v2.mlpackage",
)

6. Model Preparation¶

Requirements for Edge TPU compilation¶

For a model to execute on the Edge TPU, it must satisfy these conditions:

INT8 quantised. The Edge TPU only supports INT8 inference. Use edgecompiler quantize or pass --quantize to edgecompiler compile.
Fully-compiled for Edge TPU. A plain quantised TFLite model (model_quant.tflite) will not use the Edge TPU. It must be compiled into a *_edgetpu.tflite file that embeds the Edge TPU binary segments.
Compatible operations. Only Edge TPU-supported operations will be mapped to the accelerator. Unsupported operations fall back to CPU.

Quantisation workflow¶

# Step 1: Convert model to IR (auto-detected format)
edgecompiler convert model.onnx -o model_ir.json

# Step 2: Quantise to INT8 with calibration data
edgecompiler quantize model.onnx --mode ptq \
    --calibration-data calibration_samples/ \
    -o model_quant_ir.json

# Step 3: Compile for Edge TPU
edgecompiler compile model_quant_ir.json --target coral -o model_edgetpu.tflite

Or as a single command:

edgecompiler compile model.onnx --target coral --quantize \
    --calibration-data calibration_samples/ \
    -o model_edgetpu.tflite

Preparing calibration data¶

Calibration data should be representative of your inference inputs. Collect 100–500 samples and save them as .npy files:

import numpy as np

# Collect calibration images (preprocessed to model input shape)
samples = []
for image_path in training_image_paths[:200]:
    img = preprocess_image(image_path)  # → np.ndarray, shape (1, 224, 224, 3)
    samples.append(img)

# Save as a directory of .npy files
import os
os.makedirs("calibration_samples", exist_ok=True)
for i, sample in enumerate(samples):
    np.save(f"calibration_samples/sample_{i:04d}.npy", sample)

# Or as a single .npz file
np.savez("calibration.npz", *[s for s in samples])

Using pre-quantised models¶

Google provides a set of pre-quantised and pre-compiled Edge TPU models in the Coral model zoo. Download them with the included script:

bash scripts/download_models.sh --output-dir models/

This downloads MobileNetV1, MobileNetV2, EfficientNet-EdgeTPU-S, and SSD MobileNetV2 COCO — all ready to run on the Edge TPU without further compilation.

7. Running Classification, Detection, and Benchmark Examples¶

The examples/ directory contains ready-to-run scripts for common workflows.

Image classification¶

# Classify an image using MobileNetV1 on Coral USB
python examples/coral_usb_classify.py \
    --model models/mobilenet_v1_1.0_224_quant_edgetpu.tflite \
    --labels models/imagenet_labels.txt \
    --input models/parrot.jpg

Example output:

---- Classification Results ----
1. Ara macaw (Scarlet Macaw): 0.8594
2. African grey: 0.0469
3. macaw: 0.0312
4. lorikeet: 0.0156
5. toucan: 0.0078

Inference latency: 3.2 ms (Edge TPU)

Object detection¶

# Detect objects using SSDLite-MobileNetV2 on Coral USB
python examples/coral_usb_detect.py \
    --model models/ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite \
    --labels models/coco_labels.txt \
    --input test_image.jpg \
    --output detection_result.jpg

Benchmarking¶

# Benchmark a model on Coral USB
python examples/coral_usb_benchmark.py \
    --model models/mobilenet_v2_1.0_224_quant_edgetpu.tflite \
    --iterations 200

# Or using the CLI
edgecompiler benchmark models/mobilenet_v2_1.0_224_quant_edgetpu.tflite \
    --iterations 200 --target coral

Typical benchmark results on M1 Pro with Coral USB 3.0:

Model	Mean (ms)	P95 (ms)	P99 (ms)	FPS
MobileNetV1 INT8	2.8	3.5	4.1	357
MobileNetV2 INT8	3.1	3.9	4.6	323
EfficientNet-EdgeTPU-S INT8	4.5	5.8	6.7	222
SSD MobileNetV2 COCO INT8	6.8	8.9	10.2	147

Full demo¶

# Interactive demo that classifies, detects, and benchmarks
python examples/coral_usb_demo.py

One-command test runner (auto hardware detection)¶

Use the built-in test runner to separate simulation and hardware suites automatically:

# Auto mode: always runs simulation tests; runs hardware tests only if Coral is detected
edge-test

# Same behavior via Makefile
make test-auto

# Force simulation-only or hardware-only runs
edge-test --mode simulation
edge-test --mode hardware

8. Edge TPU Operation Compatibility and CPU Fallback¶

The Edge TPU supports a specific set of INT8 operations. When a model contains operations that the Edge TPU cannot execute, those operations are automatically routed to the CPU.

Supported operations (Edge TPU INT8)¶

Operation	Supported	Notes
Conv2D	✅	Standard + depthwise
DepthwiseConv2D	✅
FullyConnected	✅
MaxPool2D	✅
AveragePool2D	✅
ReLU / ReLU6 / ReLUN1To1	✅
Softmax	✅
Sigmoid / Tanh	✅
Add / Sub / Mul	✅	Element-wise
Concatenation	✅
Reshape / Transpose	✅
Pad	✅
ReduceMin / ReduceMax / Mean	✅
ExpandDims / Squeeze	✅
Split / Slice	✅
ResizeBilinear / ResizeNearestNeighbor	✅
Logistic / L2Normalization	✅
BatchToSpaceND / SpaceToBatchND	✅
Gather	⚠️	Fallback on some dimensions
StridedSlice	⚠️	Limited mask support
LSTM	❌	Falls back to CPU
Einsum	❌	Not supported
ScatterND	❌	Not supported

How CPU fallback works¶

When the compiler partitions the model, it creates clusters of consecutive TPU-compatible operations. At the boundaries between TPU and CPU clusters, dequantise/quantise operations are inserted to convert between INT8 (Edge TPU) and float32 (CPU) tensor formats:

TPU Cluster 0: Conv2D → BN → ReLU
    ↓ (Dequantize: INT8 → FP32)
CPU: Reshape (dynamic shape)
    ↓ (Quantize: FP32 → INT8)
TPU Cluster 1: Conv2D → ReLU → Pool
    ↓ (Dequantize: INT8 → FP32)
CPU: Softmax

Each TPU↔CPU transition adds approximately 0.1 ms of overhead. For best performance, minimise the number of transitions by choosing models that are fully compatible with the Edge TPU (e.g., MobileNet-family models).

Checking operation mapping¶

After compilation, use edgecompiler inspect to see which operations are mapped to the Edge TPU and which fall back to CPU:

edgecompiler inspect model_edgetpu.tflite --verbose

The output shows the number of TPU-mapped operations vs. CPU-fallback operations, along with a breakdown by operation type.

9. Model Size Constraints (8 MB Cache Limit)¶

The Coral USB Accelerator's Edge TPU coprocessor has approximately 8 MB of on-chip cache for parameter data (weights and biases). This is a hard limit for models that need to execute entirely on the Edge TPU without off-chip memory access.

Practical implications¶

Model Category	Parameter Size	Fits in Cache?	Performance
MobileNetV1 1.0 224	~4.2 MB	✅ Yes	Full-speed TPU execution
MobileNetV2 1.0 224	~3.5 MB	✅ Yes	Full-speed TPU execution
EfficientNet-EdgeTPU-S	~6.8 MB	✅ Yes	Full-speed TPU execution
SSD MobileNetV2 COCO	~6.6 MB	✅ Yes	Full-speed TPU execution
EfficientNet-Lite2	~8.5 MB	⚠️ Marginal	Some off-chip spills
EfficientNet-Lite4	~13 MB	❌ No	Significant off-chip access
ResNet50	~25 MB	❌ No	Most weights off-chip

When a model exceeds the cache¶

If the model's parameter data exceeds 8 MB, the Edge TPU runtime automatically spills excess data to off-chip DRAM. This works correctly but incurs a significant performance penalty (2–5× slower for the spilled layers).

To check the model size before compilation:

ls -lh model.tflite
# If the file is larger than ~8 MB, expect some off-chip spills

Strategies for staying within the cache limit¶

Use efficient architectures. MobileNet, EfficientNet-Lite, and SSDLite variants are designed to be small and Edge TPU-friendly.
Reduce the input resolution. A 192×192 input produces smaller intermediate activations than 224×224, allowing the compiler to keep more of the model on-chip.
Reduce the depth multiplier. For MobileNet, use a depth multiplier of 0.5 or 0.35 instead of 1.0.
Prune the model. Structured pruning can remove entire channels, reducing the parameter count.
Split the model. If only some layers exceed the cache, split the model into sub-models and execute them sequentially.

10. Troubleshooting Common Issues¶

"No Coral USB device detected"¶

Check the physical connection — the LED on the device should be lit.
Verify with system_profiler SPUSBDataType on macOS or lsusb on Linux.
Avoid USB-C hubs; use a direct USB-A port with an adapter.
Re-plug the device after installing libedgetpu.

"libedgetpu not found"¶

Verify the dylib exists: ls /usr/local/lib/libedgetpu*.dylib
Set DYLD_LIBRARY_PATH="/usr/local/lib:${DYLD_LIBRARY_PATH:-}"
Reinstall: bash scripts/install_coral_runtime.sh --force

"Architecture mismatch (x86_64 dylib on ARM64)"¶

Check the architecture: file /usr/local/lib/libedgetpu.1.dylib
If it says x86_64, build the ARM64 version from feranick/libedgetpu (see Section 3).

"Model not fully mapped to Edge TPU"¶

Ensure the model file is named *_edgetpu.tflite (compiled for Edge TPU).
Run edgecompiler inspect model_edgetpu.tflite to see which ops fall back to CPU.
Avoid unsupported ops (LSTM, Einsum, ScatterND) in your model.

High latency or unexpected performance¶

USB 2.0 port: The Coral needs USB 3.0 for full throughput. Check that the device is connected to a USB 3.0 port (5 Gbps).
CPU fallback: Check if ops are falling back to CPU (see above).
Thermal throttling: The Edge TPU may throttle under sustained load. Ensure adequate ventilation.
Model too large: If the model exceeds the 8 MB cache, off-chip spills slow down inference significantly.

"Permission denied on USB device" (Linux)¶

sudo usermod -aG plugdev $USER
# Log out and back in

Or add a udev rule:

sudo bash -c 'echo "SUBSYSTEM==\"usb\", ATTR{idVendor}==\"18d1\", ATTR{idProduct}==\"9302\", MODE=\"0666\"" > /etc/udev/rules.d/99-edgetpu-accelerator.rules'
sudo udevadm control --reload-rules

Python import errors¶

If from edgecompiler.runtime.coral_usb import CoralUSBRuntime fails:

Ensure edgecompiler is installed: pip install edgecompiler
Ensure tensorflow is installed (or install edgecompiler[coral])
Ensure numpy is installed: pip install numpy
Check that Python is ARM64-native: python3 -c "import platform; print(platform.machine())" → arm64

Getting help¶

Detailed setup guide: docs/coral_macos_setup.md
Architecture documentation: docs/architecture.md
Coral backend internals: docs/coral_backend.md
Examples: docs/examples.md
Issues: GitHub Issues

11. Sloth Integration in This Repository¶

edgecompiler now includes sloth_integration in the same monorepo under sloth-integration/src/sloth_integration.

Use this when you need an end-to-end text workflow:

Fine-tune small language models with unsloth-compatible adapters
Export to ONNX/TFLite through SlothConverter
Quantize and compile with edgecompiler
Run Coral USB inference via SlothCoralRuntime

Quick commands:

# Install root project with dev dependencies
pip install -e ".[dev]"

# Run sloth integration tests
pytest sloth-integration/tests -v

# Run sloth benchmark path
python sloth-integration/examples/benchmark_coral.py \
   --model sloth-integration/test_models/synthetic_text_classifier.tflite \
   --iterations 200 \
   --warmup 20

Reference docs:

sloth-integration/instructions_sloth_integration.md
sloth-integration/docs/benchmarks_sloth.md