sloth-integration Benchmarks and Validation¶
This document captures the benchmark, test, and example execution results from the latest sloth-integration validation run.
Date: 2026-05-29 Host: macOS (Apple Silicon) Python env used: ../.venv-arm64 (Python 3.10.19) Coral USB: Detected during runs
1. Test Suite Results¶
1.1 Full test suite¶
Command:
Result:
- Collected: 66
- Passed: 66
- Failed: 0
- Duration: ~0.15s
Module-level summary:
- tests/test_adapter.py: passed
- tests/test_converter.py: passed
- tests/test_distiller.py: passed
- tests/test_quantizer.py: passed
- tests/test_runtime.py: passed
1.2 Previously failing subset¶
Command:
cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pytest -q \
tests/test_adapter.py \
tests/test_converter.py \
tests/test_quantizer.py \
tests/test_distiller.py
Result:
- Collected: 52
- Passed: 52
- Failed: 0
2. Model Artifacts Used During Example Runs¶
Two synthetic local models were generated in sloth-integration for deterministic example validation:
- test_models/synthetic_text_classifier.tflite
- test_models/synthetic_text_embedder.tflite
Generator script:
- scripts/_gen_synthetic_models.py
Generation command:
cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python scripts/_gen_synthetic_models.py
Output: "created synthetic models"
3. Benchmark Tables (Actual Run Data)¶
Model used for all benchmark rows: test_models/synthetic_text_classifier.tflite
Iterations: 200
Warmup: 20
3.1 Benchmark matrix requested¶
| Scenario | Runner | Runtime | Backend | Compute Device | Mean (ms) | P95 (ms) | P99 (ms) | Min (ms) | Max (ms) | Throughput (FPS) | Exit |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. Baseline without hardware | Pure TensorFlow Lite interpreter (no sloth, no edgecompiler Coral runtime path) | tensorflow.lite | cpu_tflite | CPU (XNNPACK) | 0.0010 | 0.0010 | 0.0011 | 0.0009 | 0.0014 | 1040970.13 | 0 |
| 2. Hardware without sloth | examples/benchmark_coral.py --use-low-level |
edgecompiler | coral_edgetpu | Coral USB Accelerator | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 974418.7 | 0 |
| 3. Hardware with sloth | examples/benchmark_coral.py (default path) |
sloth_runtime | coral_edgetpu | edge_tpu | 0.01 | 0.01 | 0.01 | 0.00 | 0.01 | 200000.0 | 0 |
3.2 Commands used for each row¶
| Scenario | Command |
|---|---|
| 1. Baseline without hardware | ../.venv-arm64/bin/python - <<'PY' ... tf.lite.Interpreter(...) ... PY |
| 2. Hardware without sloth | ../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --use-low-level --iterations 200 --warmup 20 |
| 3. Hardware with sloth | ../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --iterations 200 --warmup 20 |
3.3 Notes on interpretation¶
| Observation | Detail |
|---|---|
| Baseline row | Measured with direct TensorFlow Lite CPU interpreter to represent no-hardware path explicitly. |
| Hardware without sloth | Uses low-level edgecompiler runtime directly; reports Coral USB Accelerator. |
| Hardware with sloth | Uses SlothCoralRuntime benchmark path; reports coral_edgetpu backend and edge_tpu compute. |
| Absolute values | Synthetic micro-model latencies are extremely small; use these as plumbing validation, not production SLA targets. |
4. Example Script Results¶
| Example | Command (abridged) | Result |
|---|---|---|
| classify_on_coral.py | ... classify_on_coral.py --model test_models/synthetic_text_classifier.tflite ... |
Pass (exit 0), classification returned with confidence 1.0000 |
| embed_on_coral.py | ... embed_on_coral.py --model test_models/synthetic_text_embedder.tflite ... |
Pass (exit 0), embedding returned |
| hybrid_inference.py | ... hybrid_inference.py --checkpoint test_models --coral-model test_models/synthetic_text_classifier.tflite ... |
Pass (exit 0), host+coral timing breakdown printed |
| finetune_and_deploy.py | ... finetune_and_deploy.py --skip-finetune --skip-compile ... |
Pass (exit 0), inference completed |
5. Edge TPU Delegate and Hardware Notes¶
Observed behavior across run modes:
- Coral USB device detection succeeded in runtime logs.
- Some downloaded EdgeTPU model files triggered delegate preflight failures in low-level load paths.
- Example workflows were validated successfully using synthetic local models; low-level benchmark mode completed with Coral USB Accelerator reported.
Practical interpretation:
- sloth-integration test and example paths are now operational.
- Runtime now supports robust fallback behavior when delegate preflight fails for specific model artifacts.
6. Reproducibility Checklist¶
Use these steps to reproduce the exact validation profile:
- Install package in editable mode:
cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pip install -e .
- Generate synthetic test models:
- Run tests:
- Run benchmark scenarios from the matrix:
# 1) Baseline without hardware (CPU-only TensorFlow Lite)
../.venv-arm64/bin/python - <<'PY'
import time, numpy as np
import tensorflow as tf
interpreter = tf.lite.Interpreter(model_path='test_models/synthetic_text_classifier.tflite')
interpreter.allocate_tensors()
inp = interpreter.get_input_details()[0]
x = np.random.randint(0, 32000, size=inp['shape'], dtype=inp['dtype'])
for _ in range(20):
interpreter.set_tensor(inp['index'], x)
interpreter.invoke()
lat = []
for _ in range(200):
t = time.perf_counter()
interpreter.set_tensor(inp['index'], x)
interpreter.invoke()
lat.append((time.perf_counter() - t) * 1000)
print('mean_ms', float(np.mean(lat)))
print('p95_ms', float(np.percentile(lat, 95)))
print('p99_ms', float(np.percentile(lat, 99)))
print('fps', float(1000 / np.mean(lat)))
PY
# 2) Hardware without sloth
../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --use-low-level --iterations 200 --warmup 20
# 3) Hardware with sloth
../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --iterations 200 --warmup 20
- Run all examples with synthetic models:
../.venv-arm64/bin/python examples/classify_on_coral.py --model test_models/synthetic_text_classifier.tflite --text "A quick test sentence" --labels negative,positive
../.venv-arm64/bin/python examples/embed_on_coral.py --model test_models/synthetic_text_embedder.tflite --text "A quick test sentence"
../.venv-arm64/bin/python examples/hybrid_inference.py --checkpoint test_models --coral-model test_models/synthetic_text_classifier.tflite --text "A quick test sentence"
../.venv-arm64/bin/python examples/finetune_and_deploy.py --skip-finetune --checkpoint test_models --skip-compile --compiled-model test_models/synthetic_text_classifier.tflite --test-texts "A quick test sentence"