Examples¶
This document provides detailed walkthroughs for each example script in the
examples/ directory. Each example demonstrates a different workflow with
edgecompiler.
Table of Contents¶
- Example 1: PyTorch MobileNetV2 → Coral
- Example 2: TFLite → Apple Silicon Metal
- Example 3: ONNX → Coral with PTQ
- Example 4: Benchmark Both Backends
- Example 5: Coral USB Classification
- Example 6: Coral USB Object Detection
- Example 7: Coral USB Benchmark
- Example 8: Sloth Integration Pipeline
Prerequisites¶
Before running the examples, ensure you have:
# Install edgecompiler with all frontends
pip install "edgecompiler[all]"
# Or install only what you need for each example:
pip install "edgecompiler[pytorch]" # Example 1
pip install "edgecompiler[coreml]" # Example 2
pip install "edgecompiler[onnx,coral]" # Example 3
pip install "edgecompiler[all]" # Example 4
For Coral examples, also install the Coral runtime:
Example 1: PyTorch MobileNetV2 → Coral¶
File: examples/pytorch_mobilenet.py
This example demonstrates compiling a PyTorch MobileNetV2 model for the Google Coral USB Accelerator with INT8 post-training quantisation.
What It Does¶
- Downloads or creates a MobileNetV2 model in PyTorch
- Generates calibration data (random images in this example)
- Compiles the model for Coral using INT8 PTQ
- Verifies the compiled model can load and run inference
- Reports compilation statistics (ops on TPU vs CPU fallback)
Code Walkthrough¶
#!/usr/bin/env python3
"""Example: Compile a PyTorch MobileNetV2 model for Google Coral USB."""
import numpy as np
from edgecompiler import compile
from edgecompiler.runtime import InferenceSession
# --- Step 1: Create or load a PyTorch model ---
import torch
import torchvision.models as models
# Load a pre-trained MobileNetV2
model = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.DEFAULT)
model.eval()
# Export to TorchScript
example_input = torch.randn(1, 3, 224, 224)
scripted_model = torch.jit.trace(model, example_input)
scripted_model.save("mobilenet_v2.pt")
print("✓ Saved mobilenet_v2.pt")
# --- Step 2: Generate calibration data ---
# In a real scenario, you would use representative images from your dataset.
# Here we use random data for demonstration purposes.
num_calibration_samples = 200
calibration_data = np.random.rand(num_calibration_samples, 3, 224, 224).astype(
np.float32
)
np.save("calibration_data.npy", calibration_data)
print(f"✓ Generated {num_calibration_samples} calibration samples")
# --- Step 3: Compile for Coral ---
result = compile(
"mobilenet_v2.pt",
target="coral",
quantize="ptq",
calibration_data="calibration_data.npy",
num_calibration_samples=100, # Use first 100 samples
output="mobilenet_v2_coral.tflite",
)
print(f"✓ Compilation complete:")
print(f" Output: {result.output_path}")
print(f" Ops on TPU: {result.ops_on_target}")
print(f" Ops on CPU: {result.ops_fallback}")
print(f" Backend: {result.backend}")
# --- Step 4: Run inference ---
try:
session = InferenceSession("mobilenet_v2_coral.tflite", target="coral")
# Prepare input (NHWC format for TFLite)
test_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
output = session.run({"input": test_input})
print(f"✓ Inference successful:")
print(f" Output shape: {output['output'].shape}")
print(f" Top-5 classes: {np.argsort(output['output'].flatten())[-5:][::-1]}")
session.close()
except RuntimeError as e:
print(f"⚠ Inference skipped (Coral device not available): {e}")
CLI Equivalent¶
edgecompile mobilenet_v2.pt \
--target coral \
--quantize ptq \
--calibration-data calibration_data.npy \
--num-calibration-samples 100 \
--output mobilenet_v2_coral.tflite \
--verbose
Expected Output¶
✓ Saved mobilenet_v2.pt
✓ Generated 200 calibration samples
[INFO] Frontend: PyTorch (.pt)
[INFO] Detected 152 operations
[INFO] Applying optimisation passes...
[INFO] ConstantFoldingPass: 0 ops removed
[INFO] OpFusionPass: 12 ops fused
[INFO] DeadCodeElimPass: 3 ops removed
[INFO] LayoutTransformPass: NCHW → NHWC
[INFO] Quantising with PTQ (100 calibration samples)...
[INFO] Compiling for Coral (Edge TPU)...
[INFO] 140 ops → Edge TPU
[INFO] 12 ops → CPU fallback
[INFO] 3 TPU clusters, 4 transitions
✓ Compilation complete:
Output: mobilenet_v2_coral.tflite
Ops on TPU: 140
Ops on CPU: 12
Backend: coral
✓ Inference successful:
Output shape: (1, 1000)
Top-5 classes: [954 940 941 942 939]
Example 2: TFLite → Apple Silicon Metal¶
File: examples/tflite_mobilenet.py
This example demonstrates compiling an existing TFLite model for Apple Silicon's GPU and Neural Engine via Core ML.
What It Does¶
- Downloads a pre-quantised MobileNetV2 TFLite model
- Compiles it for Apple Silicon (Metal / Neural Engine)
- Assigns compute units (ANE for convolutions, GPU for remaining ops)
- Runs inference and benchmarks latency
Code Walkthrough¶
#!/usr/bin/env python3
"""Example: Compile a TFLite model for Apple Silicon Metal/GPU."""
import numpy as np
from edgecompiler import compile
from edgecompiler.runtime import InferenceSession
# --- Step 1: Obtain a TFLite model ---
# Option A: Use a model you already have
tflite_path = "mobilenet_v2.tflite"
# Option B: Download from TensorFlow Hub
# (or use one from the edgecompiler test fixtures)
import urllib.request
url = "https://tfhub.dev/tensorflow/lite-model/mobilenet_v2_1.0_224/int8/1?lite-format=tflite"
urllib.request.urlretrieve(url, tflite_path)
print(f"✓ Downloaded {tflite_path}")
# --- Step 2: Compile for Apple Silicon ---
result = compile(
tflite_path,
target="metal",
quantize="ptq", # Re-quantise if the model is float32
compute_unit="all", # Use ANE + GPU + CPU
output="mobilenet_v2_ml.mlpackage",
)
print(f"✓ Compilation complete:")
print(f" Output: {result.output_path}")
print(f" Backend: {result.backend}")
# --- Step 3: Run inference ---
session = InferenceSession("mobilenet_v2_ml.mlpackage", target="metal")
# Prepare input (Core ML handles format conversion internally)
test_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
output = session.run({"image": test_input})
print(f"✓ Inference successful:")
print(f" Output shape: {output['output'].shape}")
print(f" Top-5 classes: {np.argsort(output['output'].flatten())[-5:][::-1]}")
# --- Step 4: Benchmark ---
stats = session.benchmark(iterations=50)
print(f"✓ Benchmark results (50 iterations):")
print(f" Mean latency: {stats.mean_ms:.2f} ms")
print(f" Median latency: {stats.median_ms:.2f} ms")
print(f" P95 latency: {stats.p95_ms:.2f} ms")
print(f" P99 latency: {stats.p99_ms:.2f} ms")
print(f" Throughput: {stats.throughput_fps:.1f} FPS")
session.close()
CLI Equivalent¶
edgecompile mobilenet_v2.tflite \
--target metal \
--quantize ptq \
--compute-unit all \
--output mobilenet_v2_ml.mlpackage \
--verbose
Expected Output¶
✓ Downloaded mobilenet_v2.tflite
[INFO] Frontend: TFLite (.tflite)
[INFO] Detected 140 operations (pre-quantised INT8)
[INFO] Applying optimisation passes...
[INFO] Compiling for Metal (Apple Silicon)...
[INFO] 132 ops → Neural Engine
[INFO] 8 ops → GPU (Metal)
[INFO] 0 ops → CPU
[INFO] Layer fusion: 12 groups fused
✓ Compilation complete:
Output: mobilenet_v2_ml.mlpackage
Backend: metal
✓ Inference successful:
Output shape: (1, 1000)
Top-5 classes: [12 435 835 723 134]
✓ Benchmark results (50 iterations):
Mean latency: 1.24 ms
Median latency: 1.18 ms
P95 latency: 1.52 ms
P99 latency: 1.78 ms
Throughput: 806.5 FPS
Example 3: ONNX → Coral with PTQ¶
File: examples/onnx_resnet.py
This example demonstrates compiling an ONNX model for Coral with post-training quantisation, including how to handle models that require calibration.
What It Does¶
- Exports a PyTorch model to ONNX format
- Simplifies the ONNX graph with
onnxsim - Compiles for Coral with INT8 PTQ
- Handles the case where some ops are not Edge TPU compatible
Code Walkthrough¶
#!/usr/bin/env python3
"""Example: Compile an ONNX model for Google Coral with INT8 PTQ."""
import numpy as np
import torch
import torchvision.models as models
from edgecompiler import compile
from edgecompiler.runtime import InferenceSession
# --- Step 1: Export a model to ONNX ---
model = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.DEFAULT)
model.eval()
dummy_input = torch.randn(1, 3, 224, 224)
onnx_path = "mobilenet_v2.onnx"
torch.onnx.export(
model,
dummy_input,
onnx_path,
input_names=["input"],
output_names=["output"],
dynamic_axes=None, # Fixed shapes for Coral compatibility
opset_version=17,
)
print(f"✓ Exported {onnx_path}")
# --- Step 2: Simplify the ONNX graph ---
# edgecompiler runs onnxsim internally, but you can do it manually too:
try:
import onnxsim
simplified_model, check = onnxsim.simplify(onnx_path)
if check:
import onnx
onnx.save(simplified_model, onnx_path)
print(f"✓ Simplified ONNX graph")
except ImportError:
print("ℹ onnxsim not available; using original ONNX graph")
# --- Step 3: Generate calibration data ---
# For image models, use representative images from your training set
calibration_data = np.random.rand(200, 3, 224, 224).astype(np.float32)
np.save("calib_onnx.npy", calibration_data)
print(f"✓ Generated calibration data")
# --- Step 4: Compile for Coral ---
result = compile(
onnx_path,
target="coral",
quantize="ptq",
calibration_data="calib_onnx.npy",
num_calibration_samples=100,
output="mobilenet_v2_onnx_coral.tflite",
)
print(f"✓ Compilation complete:")
print(f" Output: {result.output_path}")
print(f" Ops on TPU: {result.ops_on_target}")
print(f" Ops on CPU: {result.ops_fallback}")
print(f" TPU clusters: {result.metadata.get('tpu_clusters', 'N/A')}")
# --- Step 5: Verify accuracy ---
try:
session = InferenceSession("mobilenet_v2_onnx_coral.tflite", target="coral")
# Compare against PyTorch reference
test_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
coral_output = session.run({"input": test_input})
# PyTorch reference (NCHW format)
with torch.no_grad():
pt_input = torch.from_numpy(test_input.transpose(0, 3, 1, 2))
pt_output = model(pt_input).numpy()
# Compute similarity
mse = np.mean((coral_output["output"] - pt_output) ** 2)
max_diff = np.max(np.abs(coral_output["output"] - pt_output))
cos_sim = np.dot(
coral_output["output"].flatten(), pt_output.flatten()
) / (
np.linalg.norm(coral_output["output"].flatten())
* np.linalg.norm(pt_output.flatten())
)
print(f"✓ Accuracy comparison:")
print(f" MSE: {mse:.6f}")
print(f" Max diff: {max_diff:.6f}")
print(f" Cosine sim: {cos_sim:.6f}")
session.close()
except RuntimeError as e:
print(f"⚠ Accuracy check skipped (Coral device not available): {e}")
Handling Unsupported Ops¶
When the ONNX model contains ops not supported by Edge TPU, edgecompiler
provides several strategies:
# Strategy 1: Automatic CPU fallback (default)
result = compile(onnx_path, target="coral", quantize="ptq")
# Strategy 2: Decompose unsupported ops into supported primitives
result = compile(
onnx_path,
target="coral",
quantize="ptq",
decompose_unsupported=True, # Try to decompose before falling back
)
# Strategy 3: Replace specific ops
result = compile(
onnx_path,
target="coral",
quantize="ptq",
op_replacements={
"Resize": "ResizeNearestNeighbor", # Force nearest-neighbor resize
},
)
Example 4: Benchmark Both Backends¶
File: examples/coral_usb_benchmark.py
This example demonstrates benchmarking the same model on both Coral and Metal backends, comparing latency, throughput, and accuracy.
What It Does¶
- Compiles a model for both Coral and Metal
- Runs N inference iterations on each backend
- Reports detailed latency statistics
- Compares output accuracy across backends
Code Walkthrough¶
#!/usr/bin/env python3
"""Example: Benchmark a model on both Coral and Metal backends."""
import argparse
import time
import numpy as np
from edgecompiler import compile
from edgecompiler.runtime import InferenceSession
parser = argparse.ArgumentParser(description="Benchmark edgecompiler backends")
parser.add_argument("--model", default="mobilenet_v2.pt", help="Model path")
parser.add_argument("--iterations", type=int, default=100, help="Inference iterations")
parser.add_argument("--skip-compile", action="store_true", help="Skip compilation")
args = parser.parse_args()
# --- Step 1: Compile for both backends ---
if not args.skip_compile:
print("=" * 60)
print("Compiling for Coral (Edge TPU)...")
print("=" * 60)
coral_result = compile(
args.model,
target="coral",
quantize="ptq",
calibration_data="calibration_data.npy",
output="bench_coral.tflite",
)
print(f" Output: {coral_result.output_path}")
print(f" TPU ops: {coral_result.ops_on_target}")
print()
print("=" * 60)
print("Compiling for Metal (Apple Silicon)...")
print("=" * 60)
metal_result = compile(
args.model,
target="metal",
quantize="ptq",
calibration_data="calibration_data.npy",
output="bench_metal.mlpackage",
)
print(f" Output: {metal_result.output_path}")
# --- Step 2: Benchmark both backends ---
test_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
results = {}
for name, path, target in [
("Coral (Edge TPU)", "bench_coral.tflite", "coral"),
("Metal (Apple Silicon)", "bench_metal.mlpackage", "metal"),
]:
print()
print("=" * 60)
print(f"Benchmarking: {name}")
print("=" * 60)
try:
session = InferenceSession(path, target=target)
# Warm up (5 iterations)
for _ in range(5):
session.run({"input": test_input})
# Benchmark
latencies = []
for i in range(args.iterations):
start = time.perf_counter()
output = session.run({"input": test_input})
latency_ms = (time.perf_counter() - start) * 1000
latencies.append(latency_ms)
if (i + 1) % 25 == 0:
print(f" Progress: {i + 1}/{args.iterations}")
latencies = np.array(latencies)
stats = {
"mean_ms": np.mean(latencies),
"median_ms": np.median(latencies),
"p95_ms": np.percentile(latencies, 95),
"p99_ms": np.percentile(latencies, 99),
"min_ms": np.min(latencies),
"max_ms": np.max(latencies),
"std_ms": np.std(latencies),
"throughput_fps": 1000.0 / np.mean(latencies),
"output": output,
}
results[name] = stats
session.close()
except RuntimeError as e:
print(f" ⚠ Skipped: {e}")
# --- Step 3: Report results ---
print()
print("=" * 60)
print("BENCHMARK RESULTS")
print("=" * 60)
print(f"Model: {args.model}")
print(f"Iterations: {args.iterations}")
print()
print(f"{'Metric':<20} ", end="")
for name in results:
print(f"{name:>22} ", end="")
print()
print("-" * (20 + 23 * len(results)))
for metric in [
"mean_ms",
"median_ms",
"p95_ms",
"p99_ms",
"min_ms",
"max_ms",
"std_ms",
"throughput_fps",
]:
label = metric.replace("_", " ").replace("ms", "(ms)").replace("fps", "(FPS)")
print(f"{label:<20} ", end="")
for name in results:
value = results[name][metric]
print(f"{value:>22.2f} ", end="")
print()
# --- Step 4: Compare accuracy ---
if len(results) == 2:
names = list(results.keys())
out_a = results[names[0]]["output"]
out_b = results[names[1]]["output"]
# Get the first output value
key_a = list(out_a.keys())[0]
key_b = list(out_b.keys())[0]
arr_a = out_a[key_a].flatten()
arr_b = out_b[key_b].flatten()
cos_sim = np.dot(arr_a, arr_b) / (np.linalg.norm(arr_a) * np.linalg.norm(arr_b))
max_diff = np.max(np.abs(arr_a - arr_b))
print()
print(f"Cross-backend accuracy:")
print(f" Cosine similarity: {cos_sim:.6f}")
print(f" Max absolute diff: {max_diff:.6f}")
Usage¶
# Full benchmark (compile + run)
python examples/coral_usb_benchmark.py --num-runs 100
# Skip compilation (use previously compiled models)
python examples/coral_usb_benchmark.py --num-runs 100 --warmup 10
# Quick test
python examples/coral_usb_benchmark.py --num-runs 10
Expected Output¶
============================================================
Compiling for Coral (Edge TPU)...
============================================================
Output: bench_coral.tflite
TPU ops: 140
============================================================
Compiling for Metal (Apple Silicon)...
============================================================
Output: bench_metal.mlpackage
============================================================
Benchmarking: Coral (Edge TPU)
============================================================
Progress: 25/100
Progress: 50/100
Progress: 75/100
Progress: 100/100
============================================================
Benchmarking: Metal (Apple Silicon)
============================================================
Progress: 25/100
Progress: 50/100
Progress: 75/100
Progress: 100/100
============================================================
BENCHMARK RESULTS
============================================================
Model: mobilenet_v2.pt
Iterations: 100
Metric Coral (Edge TPU) Metal (Apple Silicon)
--------------------------------------------------------------------------------
mean (ms) 1.82 1.24
median (ms) 1.76 1.18
p95 (ms) 2.31 1.52
p99 (ms) 2.89 1.78
min (ms) 1.52 1.05
max (ms) 4.12 3.21
std (ms) 0.34 0.28
throughput (FPS) 549.45 806.45
Cross-backend accuracy:
Cosine similarity: 0.999847
Max absolute diff: 0.023410
Common Patterns¶
Using the Python API¶
from edgecompiler import compile
# Simple compile
result = compile("model.pt", target="coral")
# With all options
result = compile(
"model.pt",
target="metal",
quantize="ptq",
calibration_data="calib.npy",
num_calibration_samples=200,
compute_unit="all",
weight_compression="int8",
output="model_compiled.mlpackage",
verbose=True,
dump_ir="model_ir.json", # Dump intermediate IR for debugging
)
Using the CLI¶
# Basic usage
edgecompile model.pt --target coral
# With quantisation
edgecompile model.onnx --target coral --quantize ptq \
--calibration-data calib.npy --num-calibration-samples 100
# With verbose output
edgecompile model.tflite --target metal --verbose
# Dump intermediate IR
edgecompile model.pt --target metal --dump-ir model_ir.json
# Specify output format
edgecompile model.pt --target metal --output model.mlpackage
Batch Compilation¶
# Compile multiple models
for model in models/*.pt; do
edgecompile "$model" --target coral --output "compiled/$(basename "$model" .pt)_coral.tflite"
done
Custom Calibration¶
import numpy as np
from edgecompiler import compile
# Load real calibration data from your dataset
calibration_images = np.load("imagenet_calib_subset.npy") # Shape: (N, 3, 224, 224)
result = compile(
"resnet50.pt",
target="coral",
quantize="ptq",
calibration_data=calibration_images,
num_calibration_samples=500,
output="resnet50_coral.tflite",
)
print(f"Quantisation error estimate: {result.metadata.get('quant_error_estimate', 'N/A')}")
Example 5: Coral USB Classification¶
File: examples/coral_usb_classify.py
This example demonstrates the full pipeline: load a model from any supported format, quantise to INT8, compile for Edge TPU, and run live inference on a Google Coral USB Accelerator connected to a MacBook M1 Pro.
Usage¶
# Using a pre-compiled Edge TPU model
python examples/coral_usb_classify.py mobilenet_v1_edgetpu.tflite parrot.jpg \
--labels imagenet_labels.txt
# Using a PyTorch model (auto-quantises and compiles)
python examples/coral_usb_classify.py mobilenet_v1.pt parrot.jpg
# Using an ONNX model
python examples/coral_usb_classify.py model.onnx parrot.jpg --labels labels.txt
# CLI one-liner (quantise + compile + infer)
edgecompiler coral-usb model.tflite --image parrot.jpg --labels labels.txt
What It Does¶
- Detects the input model format (
.pt,.onnx,.tflite,.h5, SavedModel) - If not already an
*_edgetpu.tflite, converts to IR, quantises to INT8, and compiles for Edge TPU - Detects the Coral USB Accelerator device
- Loads the compiled model onto the device
- Preprocesses the input image (resize, quantise)
- Runs inference and prints top-k predictions with confidence scores and latency
Prerequisites¶
- Google Coral USB Accelerator plugged in
libedgetpuruntime installed (seedocs/coral_macos_setup.md)tflite-runtimeortensorflowinstalledPillowfor image loading
Example 6: Coral USB Object Detection¶
File: examples/coral_usb_detect.py
This example runs SSDLite MobileNetV2 object detection on a Coral USB Accelerator and draws bounding boxes on the output image.
Usage¶
# Run detection with a pre-compiled model
python examples/coral_usb_detect.py ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite photo.jpg \
--output detected.jpg
# With a custom threshold
python examples/coral_usb_detect.py ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite photo.jpg \
--output detected.jpg --threshold 0.7
What It Does¶
- Loads a compiled SSDLite MobileNetV2 Edge TPU model
- Detects the Coral USB device
- Runs inference on the input image
- Parses the SSDLite output (bounding boxes, class IDs, scores)
- Draws bounding boxes with COCO class labels
- Saves the annotated image to disk
Output¶
The script prints detected objects with their class name, confidence score, and bounding box coordinates, then saves an annotated image file.
Example 7: Coral USB Benchmark¶
File: examples/coral_usb_benchmark.py
This example loads a compiled Edge TPU model, runs 1000 inferences, and prints detailed latency statistics (p50, p95, p99), throughput (FPS), and temperature warnings.
Usage¶
# Full benchmark (1000 runs)
python examples/coral_usb_benchmark.py mobilenet_v1_edgetpu.tflite
# Custom number of runs
python examples/coral_usb_benchmark.py mobilenet_v1_edgetpu.tflite \
--num-runs 500 --warmup 20
What It Does¶
- Detects the Coral USB Accelerator
- Loads the compiled model
- Runs warmup iterations (default: 10)
- Runs N inference iterations (default: 1000)
- Computes latency statistics: mean, median, min, max, p50, p95, p99, std dev
- Reports throughput in FPS
- Warns if high latency suggests thermal throttling
Expected Output¶
=======================================================
Benchmark Results (Coral Edge TPU)
=======================================================
Model: mobilenet_v1_edgetpu.tflite
Device: Coral USB Accelerator
Iterations: 1000
Warmup: 10
Latency Statistics
----------------------------------------
Mean: 1.82 ms
Median: 1.76 ms
Min: 1.52 ms
Max: 4.12 ms
P50: 1.76 ms
P95: 2.31 ms
P99: 2.89 ms
Std Dev: 0.34 ms
Throughput
----------------------------------------
Total time: 1.820 s
Throughput: 549.5 FPS
Platform: Darwin (arm64)
Example 8: Sloth Integration Pipeline¶
Files:
sloth-integration/examples/finetune_and_deploy.pysloth-integration/examples/classify_on_coral.pysloth-integration/examples/embed_on_coral.pysloth-integration/examples/benchmark_coral.py
These examples cover the integrated text-model workflow from checkpoint export to
Coral deployment using sloth_integration inside this repository.
Usage¶
# Run sloth integration tests
pytest sloth-integration/tests -v
# Classification benchmark path (Edge TPU)
python sloth-integration/examples/benchmark_coral.py \
--model sloth-integration/test_models/synthetic_text_classifier.tflite \
--iterations 200 \
--warmup 20
See Also¶
sloth-integration/docs/benchmarks_sloth.mdfor measured benchmark tablesdocs/sloth_integration.mdfor setup and workflow details