sloth-integration Benchmarks and Validation¶

This document captures the benchmark, test, and example execution results from the latest sloth-integration validation run.

Date: 2026-05-29 Host: macOS (Apple Silicon) Python env used: ../.venv-arm64 (Python 3.10.19) Coral USB: Detected during runs

1. Test Suite Results¶

1.1 Full test suite¶

Command:

cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pytest -q tests

Result:

Collected: 66
Passed: 66
Failed: 0
Duration: ~0.15s

Module-level summary:

tests/test_adapter.py: passed
tests/test_converter.py: passed
tests/test_distiller.py: passed
tests/test_quantizer.py: passed
tests/test_runtime.py: passed

1.2 Previously failing subset¶

Command:

cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pytest -q \
  tests/test_adapter.py \
  tests/test_converter.py \
  tests/test_quantizer.py \
  tests/test_distiller.py

Result:

Collected: 52
Passed: 52
Failed: 0

2. Model Artifacts Used During Example Runs¶

Two synthetic local models were generated in sloth-integration for deterministic example validation:

test_models/synthetic_text_classifier.tflite
test_models/synthetic_text_embedder.tflite

Generator script:

scripts/_gen_synthetic_models.py

Generation command:

cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python scripts/_gen_synthetic_models.py

Output: "created synthetic models"

3. Benchmark Tables (Actual Run Data)¶

Model used for all benchmark rows: test_models/synthetic_text_classifier.tflite Iterations: 200 Warmup: 20

3.1 Benchmark matrix requested¶

Scenario	Runner	Runtime	Backend	Compute Device	Mean (ms)	P95 (ms)	P99 (ms)	Min (ms)	Max (ms)	Throughput (FPS)
1. Baseline without hardware	Pure TensorFlow Lite interpreter (no sloth, no edgecompiler Coral runtime path)	tensorflow.lite	cpu_tflite	CPU (XNNPACK)	0.0010	0.0010	0.0011	0.0009	0.0014	1040970.13
2. Hardware without sloth	`examples/benchmark_coral.py --use-low-level`	edgecompiler	coral_edgetpu	Coral USB Accelerator	0.00	0.00	0.00	0.00	0.02	974418.7
3. Hardware with sloth	`examples/benchmark_coral.py` (default path)	sloth_runtime	coral_edgetpu	edge_tpu	0.01	0.01	0.01	0.00	0.01	200000.0

3.2 Commands used for each row¶

Scenario	Command
1. Baseline without hardware	`../.venv-arm64/bin/python - <<'PY' ... tf.lite.Interpreter(...) ... PY`
2. Hardware without sloth	`../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --use-low-level --iterations 200 --warmup 20`
3. Hardware with sloth	`../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --iterations 200 --warmup 20`

3.3 Notes on interpretation¶

Observation	Detail
Baseline row	Measured with direct TensorFlow Lite CPU interpreter to represent no-hardware path explicitly.
Hardware without sloth	Uses low-level edgecompiler runtime directly; reports Coral USB Accelerator.
Hardware with sloth	Uses SlothCoralRuntime benchmark path; reports `coral_edgetpu` backend and `edge_tpu` compute.
Absolute values	Synthetic micro-model latencies are extremely small; use these as plumbing validation, not production SLA targets.

4. Example Script Results¶

Example	Command (abridged)	Result
classify_on_coral.py	`... classify_on_coral.py --model test_models/synthetic_text_classifier.tflite ...`	Pass (exit 0), classification returned with confidence 1.0000
embed_on_coral.py	`... embed_on_coral.py --model test_models/synthetic_text_embedder.tflite ...`	Pass (exit 0), embedding returned
hybrid_inference.py	`... hybrid_inference.py --checkpoint test_models --coral-model test_models/synthetic_text_classifier.tflite ...`	Pass (exit 0), host+coral timing breakdown printed
finetune_and_deploy.py	`... finetune_and_deploy.py --skip-finetune --skip-compile ...`	Pass (exit 0), inference completed

5. Edge TPU Delegate and Hardware Notes¶

Observed behavior across run modes:

Coral USB device detection succeeded in runtime logs.
Some downloaded EdgeTPU model files triggered delegate preflight failures in low-level load paths.
Example workflows were validated successfully using synthetic local models; low-level benchmark mode completed with Coral USB Accelerator reported.

Practical interpretation:

sloth-integration test and example paths are now operational.
Runtime now supports robust fallback behavior when delegate preflight fails for specific model artifacts.

6. Reproducibility Checklist¶

Use these steps to reproduce the exact validation profile:

Install package in editable mode:

cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pip install -e .

Generate synthetic test models:

../.venv-arm64/bin/python scripts/_gen_synthetic_models.py

Run tests:

../.venv-arm64/bin/python -m pytest -q tests

Run benchmark scenarios from the matrix:

# 1) Baseline without hardware (CPU-only TensorFlow Lite)
../.venv-arm64/bin/python - <<'PY'
import time, numpy as np
import tensorflow as tf
interpreter = tf.lite.Interpreter(model_path='test_models/synthetic_text_classifier.tflite')
interpreter.allocate_tensors()
inp = interpreter.get_input_details()[0]
x = np.random.randint(0, 32000, size=inp['shape'], dtype=inp['dtype'])
for _ in range(20):
  interpreter.set_tensor(inp['index'], x)
  interpreter.invoke()
lat = []
for _ in range(200):
  t = time.perf_counter()
  interpreter.set_tensor(inp['index'], x)
  interpreter.invoke()
  lat.append((time.perf_counter() - t) * 1000)
print('mean_ms', float(np.mean(lat)))
print('p95_ms', float(np.percentile(lat, 95)))
print('p99_ms', float(np.percentile(lat, 99)))
print('fps', float(1000 / np.mean(lat)))
PY

# 2) Hardware without sloth
../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --use-low-level --iterations 200 --warmup 20

# 3) Hardware with sloth
../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --iterations 200 --warmup 20

Run all examples with synthetic models:

../.venv-arm64/bin/python examples/classify_on_coral.py --model test_models/synthetic_text_classifier.tflite --text "A quick test sentence" --labels negative,positive
../.venv-arm64/bin/python examples/embed_on_coral.py --model test_models/synthetic_text_embedder.tflite --text "A quick test sentence"
../.venv-arm64/bin/python examples/hybrid_inference.py --checkpoint test_models --coral-model test_models/synthetic_text_classifier.tflite --text "A quick test sentence"
../.venv-arm64/bin/python examples/finetune_and_deploy.py --skip-finetune --checkpoint test_models --skip-compile --compiled-model test_models/synthetic_text_classifier.tflite --test-texts "A quick test sentence"