Skip to content

edgecompiler — Instructions

A native compiler toolchain for edge AI on macOS Apple Silicon and Google Coral USB.

This document is the main getting-started guide for the edgecompiler repository. It covers everything from installation through running production inference on a Coral USB Accelerator connected to a MacBook with an M1 Pro (or any Apple Silicon Mac).

For the detailed macOS Coral USB setup walkthrough, see docs/coral_macos_setup.md.


Table of Contents

  1. Overview
  2. Quick Start
  3. Installation (macOS M1 Pro)
  4. CLI Usage
  5. Python API Usage
  6. Model Preparation
  7. Running Classification, Detection, and Benchmark Examples
  8. Edge TPU Operation Compatibility and CPU Fallback
  9. Model Size Constraints (8 MB Cache Limit)
  10. Troubleshooting Common Issues
  11. Sloth Integration in This Repository

1. Overview

edgecompiler replaces the official edgetpu_compiler binary and extends compilation support to Apple Silicon GPU (Metal / MPS / Neural Engine) alongside Google Coral USB Accelerator (Edge TPU). The toolchain:

  1. Ingests models from PyTorch (.pt/.pth), TensorFlow Lite (.tflite), ONNX (.onnx), and TensorFlow SavedModel directories.
  2. Converts them to a unified intermediate representation (IR) — a directed acyclic graph of typed tensors and operations.
  3. Applies optimisation passes — constant folding, op fusion, dead code elimination, and layout transformations.
  4. Quantises the graph to INT8 using Post-Training Quantisation (PTQ), Quantisation-Aware Training (QAT), or dynamic-range quantisation.
  5. Compiles the quantised IR for the target backend:
  6. Coral USB (Edge TPU): Produces a *_edgetpu.tflite file with embedded custom-op binary segments for the Edge TPU coprocessor.
  7. Apple Silicon (Metal): Produces a .mlpackage Core ML model or MPSGraph Objective-C++ source for direct GPU/ANE execution.
  8. Runs inference on the compiled model through the CoralUSBRuntime or MetalInferenceSession Python classes.

The entire pipeline runs natively on macOS ARM64 — no Docker, no Rosetta 2, no x86-64 emulation.


2. Quick Start

The fastest way to get running with a Coral USB Accelerator:

from edgecompiler.runtime.coral_usb import CoralUSBRuntime
import numpy as np

with CoralUSBRuntime() as rt:                          # 1. Create runtime
    rt.load_model("mobilenet_v1_edgetpu.tflite")       # 2. Load compiled model
    img = np.zeros((1, 224, 224, 3), dtype=np.uint8)   # 3. Prepare input
    result = rt.infer(img, top_k=5)                     # 4. Run inference
    print(f"Latency: {result.latency_ms:.1f} ms")       # 5. Read results
    for cls, score in result.top_classes:
        print(f"  Class {cls}: {score:.4f}")

CLI equivalent:

# Compile a model for Coral USB
edgecompiler compile model.onnx --target coral --quantize -o model_edgetpu.tflite

# Run inference
edgecompiler run model_edgetpu.tflite --input image.jpg --target coral

# Benchmark
edgecompiler benchmark model_edgetpu.tflite --iterations 100 --target coral

3. Installation (macOS M1 Pro)

Step 1: Install edgecompiler

# Basic install
pip install edgecompiler

# With Coral runtime extras (TensorFlow Lite interpreter path)
pip install "edgecompiler[coral]"

# With all frontends and backends
pip install "edgecompiler[all]"

# Development install from source
git clone https://github.com/rotsl/edgecompiler.git
cd edgecompiler
make install   # pip install -e ".[dev]"

edgecompiler[coral] installs Python-side Coral runtime extras. You still need the Edge TPU shared library (libedgetpu) from Step 2 below.

Step 2: Install the Coral Edge TPU runtime

The libedgetpu shared library is required to communicate with the Coral USB Accelerator. Google does not ship official ARM64 macOS builds, so use the community-maintained build or the helper script:

# Option A: Use the helper script (recommended)
bash scripts/install_coral_runtime.sh

# Option B: Build from source for ARM64
git clone https://github.com/feranick/libedgetpu.git
cd libedgetpu
CPU=darwin_arm64 make
sudo cp out/darwin_arm64/libedgetpu.1.dylib /usr/local/lib/
sudo ln -sf /usr/local/lib/libedgetpu.1.dylib /usr/local/lib/libedgetpu.dylib

Step 3: Set the library path

echo 'export DYLD_LIBRARY_PATH="/usr/local/lib:${DYLD_LIBRARY_PATH:-}"' >> ~/.zshrc
source ~/.zshrc

Step 4: Verify

python3 -c "
from edgecompiler.runtime.coral_usb import CoralUSBRuntime
rt = CoralUSBRuntime()
print(f'libedgetpu: {rt._lib_available}')
devices = rt.detect_devices()
print(f'Devices: {devices}')
"

Step 5: Download hardware test models

bash scripts/download_models.sh --output-dir tests/hardware/test_models

Step 6: Review benchmark report

See benchmarks.md for the latest with/without hardware test-runner results.

For the complete step-by-step setup guide with troubleshooting, see docs/coral_macos_setup.md.


4. CLI Usage

The edgecompiler CLI (invoked as edgecompiler or edgecompile) provides several subcommands:

compile

Compile a model for a target backend:

edgecompiler compile MODEL --target {coral,metal} [OPTIONS]
Option Default Description
--target, -t (required) Target backend: coral or metal
--quantize, -q False Apply post-training INT8 quantisation
--output, -o Auto-named Output path for the compiled model
--calibration-data None Path to calibration data (.npy, .npz, or directory)
--per-channel True Use per-channel (per-axis) weight quantisation
--symmetric True Use symmetric quantisation
--input-shape None Input shape for PyTorch models (e.g., 1,3,224,224)
--simplify True Apply onnx-simplifier for ONNX models
--min-runtime-version 14 Minimum Edge TPU runtime version (Coral only)
--verbose, -v 0 Increase verbosity (-v for INFO, -vv for DEBUG)

Examples:

# ONNX → Coral with PTQ
edgecompiler compile model.onnx --target coral --quantize --calibration-data calib.npy

# TFLite → Metal
edgecompiler compile model.tflite --target metal -o model.mlpackage

# PyTorch → Coral (must specify input shape)
edgecompiler compile model.pt --target coral --input-shape 1,3,224,224 --quantize

quantize

Apply a quantisation strategy without compiling:

edgecompiler quantize MODEL --mode {ptq,qat,dynamic} [OPTIONS]
Option Default Description
--mode, -m ptq Quantisation mode: ptq, qat, or dynamic
--calibration-data None Calibration data for PTQ
--per-channel True Per-channel weight quantisation
--symmetric True Symmetric quantisation
--output, -o Auto-named Output path for the quantised IR JSON

convert

Convert a model to IR JSON without quantising or compiling:

edgecompiler convert MODEL --output model_ir.json

inspect

Inspect a model's structure, operations, and quantisation info:

edgecompiler inspect MODEL

run

Run a single inference on a compiled model:

edgecompiler run MODEL --input INPUT --target {coral,metal} [OPTIONS]
Option Description
--input Path to input data: .npy, .npz, or image file (.jpg, .png)
--target Runtime target: coral or metal (auto-detected from filename)
--output Save output tensors to .npz file

benchmark

Benchmark a compiled model and report latency statistics:

edgecompiler benchmark MODEL --iterations N --target {coral,metal}
Option Default Description
--iterations 100 Number of inference iterations
--target Auto-detected Runtime target
--input Random data Input data (.npy, .npz, or image)

5. Python API Usage

CoralUSBRuntime

The CoralUSBRuntime class provides a high-level Python API for Coral USB inference:

from edgecompiler.runtime.coral_usb import CoralUSBRuntime
import numpy as np

# Create runtime (auto-detects libedgetpu)
runtime = CoralUSBRuntime()

# Or specify the library path explicitly
runtime = CoralUSBRuntime(libedgetpu_path="/usr/local/lib/libedgetpu.1.dylib")

# Detect devices
devices = runtime.detect_devices()
# → [CoralDevice(path=':0', type='USB', name='Coral USB Accelerator')]

# Load a compiled Edge TPU model
runtime.load_model("model_edgetpu.tflite", device=":0")

# Get expected input size
h, w = runtime.get_input_size()  # e.g., (224, 224)

# Run inference
input_data = np.zeros((1, 224, 224, 3), dtype=np.uint8)
result = runtime.infer(input_data, top_k=5)

# Access results
print(result.output)         # Primary output tensor (numpy.ndarray)
print(result.all_outputs)    # Dict of all output tensors
print(result.latency_ms)     # Inference latency in milliseconds
print(result.top_classes)    # List of (class_id, score) tuples

# Run classification with labels
with open("imagenet_labels.txt") as f:
    labels = [line.strip() for line in f]
labelled = runtime.classify(input_data, top_k=5, labels=labels)
# → [("great grey owl", 0.92), ("owl", 0.04), ...]

# Benchmark
stats = runtime.benchmark(num_runs=100, warmup_runs=5)
print(f"Mean: {stats['mean_latency_ms']:.2f} ms")
print(f"P95:  {stats['p95_latency_ms']:.2f} ms")
print(f"FPS:  {stats['throughput_fps']:.1f}")

# Clean up
runtime.close()

# Or use as a context manager
with CoralUSBRuntime() as rt:
    rt.load_model("model_edgetpu.tflite")
    result = rt.infer(input_data)

Programmatic compilation

from edgecompiler import compile

# Compile for Coral USB
result = compile(
    "mobilenet_v2.onnx",
    target="coral",
    quantize="ptq",
    calibration_data="calibration_images.npy",
    output="mobilenet_v2_coral.tflite",
)
print(result)
# CompileResult(output_path='mobilenet_v2_coral.tflite', ops_on_target=140, ops_fallback=12)

# Compile for Metal
result = compile(
    "mobilenet_v2.tflite",
    target="metal",
    output="mobilenet_v2.mlpackage",
)

6. Model Preparation

Requirements for Edge TPU compilation

For a model to execute on the Edge TPU, it must satisfy these conditions:

  1. INT8 quantised. The Edge TPU only supports INT8 inference. Use edgecompiler quantize or pass --quantize to edgecompiler compile.

  2. Fully-compiled for Edge TPU. A plain quantised TFLite model (model_quant.tflite) will not use the Edge TPU. It must be compiled into a *_edgetpu.tflite file that embeds the Edge TPU binary segments.

  3. Compatible operations. Only Edge TPU-supported operations will be mapped to the accelerator. Unsupported operations fall back to CPU.

Quantisation workflow

# Step 1: Convert model to IR (auto-detected format)
edgecompiler convert model.onnx -o model_ir.json

# Step 2: Quantise to INT8 with calibration data
edgecompiler quantize model.onnx --mode ptq \
    --calibration-data calibration_samples/ \
    -o model_quant_ir.json

# Step 3: Compile for Edge TPU
edgecompiler compile model_quant_ir.json --target coral -o model_edgetpu.tflite

Or as a single command:

edgecompiler compile model.onnx --target coral --quantize \
    --calibration-data calibration_samples/ \
    -o model_edgetpu.tflite

Preparing calibration data

Calibration data should be representative of your inference inputs. Collect 100–500 samples and save them as .npy files:

import numpy as np

# Collect calibration images (preprocessed to model input shape)
samples = []
for image_path in training_image_paths[:200]:
    img = preprocess_image(image_path)  # → np.ndarray, shape (1, 224, 224, 3)
    samples.append(img)

# Save as a directory of .npy files
import os
os.makedirs("calibration_samples", exist_ok=True)
for i, sample in enumerate(samples):
    np.save(f"calibration_samples/sample_{i:04d}.npy", sample)

# Or as a single .npz file
np.savez("calibration.npz", *[s for s in samples])

Using pre-quantised models

Google provides a set of pre-quantised and pre-compiled Edge TPU models in the Coral model zoo. Download them with the included script:

bash scripts/download_models.sh --output-dir models/

This downloads MobileNetV1, MobileNetV2, EfficientNet-EdgeTPU-S, and SSD MobileNetV2 COCO — all ready to run on the Edge TPU without further compilation.


7. Running Classification, Detection, and Benchmark Examples

The examples/ directory contains ready-to-run scripts for common workflows.

Image classification

# Classify an image using MobileNetV1 on Coral USB
python examples/coral_usb_classify.py \
    --model models/mobilenet_v1_1.0_224_quant_edgetpu.tflite \
    --labels models/imagenet_labels.txt \
    --input models/parrot.jpg

Example output:

---- Classification Results ----
1. Ara macaw (Scarlet Macaw): 0.8594
2. African grey: 0.0469
3. macaw: 0.0312
4. lorikeet: 0.0156
5. toucan: 0.0078

Inference latency: 3.2 ms (Edge TPU)

Object detection

# Detect objects using SSDLite-MobileNetV2 on Coral USB
python examples/coral_usb_detect.py \
    --model models/ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite \
    --labels models/coco_labels.txt \
    --input test_image.jpg \
    --output detection_result.jpg

Benchmarking

# Benchmark a model on Coral USB
python examples/coral_usb_benchmark.py \
    --model models/mobilenet_v2_1.0_224_quant_edgetpu.tflite \
    --iterations 200

# Or using the CLI
edgecompiler benchmark models/mobilenet_v2_1.0_224_quant_edgetpu.tflite \
    --iterations 200 --target coral

Typical benchmark results on M1 Pro with Coral USB 3.0:

Model Mean (ms) P95 (ms) P99 (ms) FPS
MobileNetV1 INT8 2.8 3.5 4.1 357
MobileNetV2 INT8 3.1 3.9 4.6 323
EfficientNet-EdgeTPU-S INT8 4.5 5.8 6.7 222
SSD MobileNetV2 COCO INT8 6.8 8.9 10.2 147

Full demo

# Interactive demo that classifies, detects, and benchmarks
python examples/coral_usb_demo.py

One-command test runner (auto hardware detection)

Use the built-in test runner to separate simulation and hardware suites automatically:

# Auto mode: always runs simulation tests; runs hardware tests only if Coral is detected
edge-test

# Same behavior via Makefile
make test-auto

# Force simulation-only or hardware-only runs
edge-test --mode simulation
edge-test --mode hardware

8. Edge TPU Operation Compatibility and CPU Fallback

The Edge TPU supports a specific set of INT8 operations. When a model contains operations that the Edge TPU cannot execute, those operations are automatically routed to the CPU.

Supported operations (Edge TPU INT8)

Operation Supported Notes
Conv2D Standard + depthwise
DepthwiseConv2D
FullyConnected
MaxPool2D
AveragePool2D
ReLU / ReLU6 / ReLUN1To1
Softmax
Sigmoid / Tanh
Add / Sub / Mul Element-wise
Concatenation
Reshape / Transpose
Pad
ReduceMin / ReduceMax / Mean
ExpandDims / Squeeze
Split / Slice
ResizeBilinear / ResizeNearestNeighbor
Logistic / L2Normalization
BatchToSpaceND / SpaceToBatchND
Gather ⚠️ Fallback on some dimensions
StridedSlice ⚠️ Limited mask support
LSTM Falls back to CPU
Einsum Not supported
ScatterND Not supported

How CPU fallback works

When the compiler partitions the model, it creates clusters of consecutive TPU-compatible operations. At the boundaries between TPU and CPU clusters, dequantise/quantise operations are inserted to convert between INT8 (Edge TPU) and float32 (CPU) tensor formats:

TPU Cluster 0: Conv2D → BN → ReLU
    ↓ (Dequantize: INT8 → FP32)
CPU: Reshape (dynamic shape)
    ↓ (Quantize: FP32 → INT8)
TPU Cluster 1: Conv2D → ReLU → Pool
    ↓ (Dequantize: INT8 → FP32)
CPU: Softmax

Each TPU↔CPU transition adds approximately 0.1 ms of overhead. For best performance, minimise the number of transitions by choosing models that are fully compatible with the Edge TPU (e.g., MobileNet-family models).

Checking operation mapping

After compilation, use edgecompiler inspect to see which operations are mapped to the Edge TPU and which fall back to CPU:

edgecompiler inspect model_edgetpu.tflite --verbose

The output shows the number of TPU-mapped operations vs. CPU-fallback operations, along with a breakdown by operation type.


9. Model Size Constraints (8 MB Cache Limit)

The Coral USB Accelerator's Edge TPU coprocessor has approximately 8 MB of on-chip cache for parameter data (weights and biases). This is a hard limit for models that need to execute entirely on the Edge TPU without off-chip memory access.

Practical implications

Model Category Parameter Size Fits in Cache? Performance
MobileNetV1 1.0 224 ~4.2 MB ✅ Yes Full-speed TPU execution
MobileNetV2 1.0 224 ~3.5 MB ✅ Yes Full-speed TPU execution
EfficientNet-EdgeTPU-S ~6.8 MB ✅ Yes Full-speed TPU execution
SSD MobileNetV2 COCO ~6.6 MB ✅ Yes Full-speed TPU execution
EfficientNet-Lite2 ~8.5 MB ⚠️ Marginal Some off-chip spills
EfficientNet-Lite4 ~13 MB ❌ No Significant off-chip access
ResNet50 ~25 MB ❌ No Most weights off-chip

When a model exceeds the cache

If the model's parameter data exceeds 8 MB, the Edge TPU runtime automatically spills excess data to off-chip DRAM. This works correctly but incurs a significant performance penalty (2–5× slower for the spilled layers).

To check the model size before compilation:

ls -lh model.tflite
# If the file is larger than ~8 MB, expect some off-chip spills

Strategies for staying within the cache limit

  1. Use efficient architectures. MobileNet, EfficientNet-Lite, and SSDLite variants are designed to be small and Edge TPU-friendly.

  2. Reduce the input resolution. A 192×192 input produces smaller intermediate activations than 224×224, allowing the compiler to keep more of the model on-chip.

  3. Reduce the depth multiplier. For MobileNet, use a depth multiplier of 0.5 or 0.35 instead of 1.0.

  4. Prune the model. Structured pruning can remove entire channels, reducing the parameter count.

  5. Split the model. If only some layers exceed the cache, split the model into sub-models and execute them sequentially.


10. Troubleshooting Common Issues

"No Coral USB device detected"

  1. Check the physical connection — the LED on the device should be lit.
  2. Verify with system_profiler SPUSBDataType on macOS or lsusb on Linux.
  3. Avoid USB-C hubs; use a direct USB-A port with an adapter.
  4. Re-plug the device after installing libedgetpu.

"libedgetpu not found"

  1. Verify the dylib exists: ls /usr/local/lib/libedgetpu*.dylib
  2. Set DYLD_LIBRARY_PATH="/usr/local/lib:${DYLD_LIBRARY_PATH:-}"
  3. Reinstall: bash scripts/install_coral_runtime.sh --force

"Architecture mismatch (x86_64 dylib on ARM64)"

  1. Check the architecture: file /usr/local/lib/libedgetpu.1.dylib
  2. If it says x86_64, build the ARM64 version from feranick/libedgetpu (see Section 3).

"Model not fully mapped to Edge TPU"

  1. Ensure the model file is named *_edgetpu.tflite (compiled for Edge TPU).
  2. Run edgecompiler inspect model_edgetpu.tflite to see which ops fall back to CPU.
  3. Avoid unsupported ops (LSTM, Einsum, ScatterND) in your model.

High latency or unexpected performance

  1. USB 2.0 port: The Coral needs USB 3.0 for full throughput. Check that the device is connected to a USB 3.0 port (5 Gbps).
  2. CPU fallback: Check if ops are falling back to CPU (see above).
  3. Thermal throttling: The Edge TPU may throttle under sustained load. Ensure adequate ventilation.
  4. Model too large: If the model exceeds the 8 MB cache, off-chip spills slow down inference significantly.

"Permission denied on USB device" (Linux)

sudo usermod -aG plugdev $USER
# Log out and back in

Or add a udev rule:

sudo bash -c 'echo "SUBSYSTEM==\"usb\", ATTR{idVendor}==\"18d1\", ATTR{idProduct}==\"9302\", MODE=\"0666\"" > /etc/udev/rules.d/99-edgetpu-accelerator.rules'
sudo udevadm control --reload-rules

Python import errors

If from edgecompiler.runtime.coral_usb import CoralUSBRuntime fails:

  1. Ensure edgecompiler is installed: pip install edgecompiler
  2. Ensure tensorflow is installed (or install edgecompiler[coral])
  3. Ensure numpy is installed: pip install numpy
  4. Check that Python is ARM64-native: python3 -c "import platform; print(platform.machine())"arm64

Getting help


11. Sloth Integration in This Repository

edgecompiler now includes sloth_integration in the same monorepo under sloth-integration/src/sloth_integration.

Use this when you need an end-to-end text workflow:

  1. Fine-tune small language models with unsloth-compatible adapters
  2. Export to ONNX/TFLite through SlothConverter
  3. Quantize and compile with edgecompiler
  4. Run Coral USB inference via SlothCoralRuntime

Quick commands:

# Install root project with dev dependencies
pip install -e ".[dev]"

# Run sloth integration tests
pytest sloth-integration/tests -v

# Run sloth benchmark path
python sloth-integration/examples/benchmark_coral.py \
   --model sloth-integration/test_models/synthetic_text_classifier.tflite \
   --iterations 200 \
   --warmup 20

Reference docs:

  • sloth-integration/instructions_sloth_integration.md
  • sloth-integration/docs/benchmarks_sloth.md