Skip to content

sloth-integration Benchmarks and Validation

This document captures the benchmark, test, and example execution results from the latest sloth-integration validation run.

Date: 2026-05-29 Host: macOS (Apple Silicon) Python env used: ../.venv-arm64 (Python 3.10.19) Coral USB: Detected during runs


1. Test Suite Results

1.1 Full test suite

Command:

cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pytest -q tests

Result:

  • Collected: 66
  • Passed: 66
  • Failed: 0
  • Duration: ~0.15s

Module-level summary:

  • tests/test_adapter.py: passed
  • tests/test_converter.py: passed
  • tests/test_distiller.py: passed
  • tests/test_quantizer.py: passed
  • tests/test_runtime.py: passed

1.2 Previously failing subset

Command:

cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pytest -q \
  tests/test_adapter.py \
  tests/test_converter.py \
  tests/test_quantizer.py \
  tests/test_distiller.py

Result:

  • Collected: 52
  • Passed: 52
  • Failed: 0

2. Model Artifacts Used During Example Runs

Two synthetic local models were generated in sloth-integration for deterministic example validation:

  • test_models/synthetic_text_classifier.tflite
  • test_models/synthetic_text_embedder.tflite

Generator script:

  • scripts/_gen_synthetic_models.py

Generation command:

cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python scripts/_gen_synthetic_models.py

Output: "created synthetic models"


3. Benchmark Tables (Actual Run Data)

Model used for all benchmark rows: test_models/synthetic_text_classifier.tflite Iterations: 200 Warmup: 20

3.1 Benchmark matrix requested

Scenario Runner Runtime Backend Compute Device Mean (ms) P95 (ms) P99 (ms) Min (ms) Max (ms) Throughput (FPS) Exit
1. Baseline without hardware Pure TensorFlow Lite interpreter (no sloth, no edgecompiler Coral runtime path) tensorflow.lite cpu_tflite CPU (XNNPACK) 0.0010 0.0010 0.0011 0.0009 0.0014 1040970.13 0
2. Hardware without sloth examples/benchmark_coral.py --use-low-level edgecompiler coral_edgetpu Coral USB Accelerator 0.00 0.00 0.00 0.00 0.02 974418.7 0
3. Hardware with sloth examples/benchmark_coral.py (default path) sloth_runtime coral_edgetpu edge_tpu 0.01 0.01 0.01 0.00 0.01 200000.0 0

3.2 Commands used for each row

Scenario Command
1. Baseline without hardware ../.venv-arm64/bin/python - <<'PY' ... tf.lite.Interpreter(...) ... PY
2. Hardware without sloth ../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --use-low-level --iterations 200 --warmup 20
3. Hardware with sloth ../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --iterations 200 --warmup 20

3.3 Notes on interpretation

Observation Detail
Baseline row Measured with direct TensorFlow Lite CPU interpreter to represent no-hardware path explicitly.
Hardware without sloth Uses low-level edgecompiler runtime directly; reports Coral USB Accelerator.
Hardware with sloth Uses SlothCoralRuntime benchmark path; reports coral_edgetpu backend and edge_tpu compute.
Absolute values Synthetic micro-model latencies are extremely small; use these as plumbing validation, not production SLA targets.

4. Example Script Results

Example Command (abridged) Result
classify_on_coral.py ... classify_on_coral.py --model test_models/synthetic_text_classifier.tflite ... Pass (exit 0), classification returned with confidence 1.0000
embed_on_coral.py ... embed_on_coral.py --model test_models/synthetic_text_embedder.tflite ... Pass (exit 0), embedding returned
hybrid_inference.py ... hybrid_inference.py --checkpoint test_models --coral-model test_models/synthetic_text_classifier.tflite ... Pass (exit 0), host+coral timing breakdown printed
finetune_and_deploy.py ... finetune_and_deploy.py --skip-finetune --skip-compile ... Pass (exit 0), inference completed

5. Edge TPU Delegate and Hardware Notes

Observed behavior across run modes:

  • Coral USB device detection succeeded in runtime logs.
  • Some downloaded EdgeTPU model files triggered delegate preflight failures in low-level load paths.
  • Example workflows were validated successfully using synthetic local models; low-level benchmark mode completed with Coral USB Accelerator reported.

Practical interpretation:

  • sloth-integration test and example paths are now operational.
  • Runtime now supports robust fallback behavior when delegate preflight fails for specific model artifacts.

6. Reproducibility Checklist

Use these steps to reproduce the exact validation profile:

  1. Install package in editable mode:
cd /Users/wot25kir/coraledgecompiler/sloth-integration
../.venv-arm64/bin/python -m pip install -e .
  1. Generate synthetic test models:
../.venv-arm64/bin/python scripts/_gen_synthetic_models.py
  1. Run tests:
../.venv-arm64/bin/python -m pytest -q tests
  1. Run benchmark scenarios from the matrix:
# 1) Baseline without hardware (CPU-only TensorFlow Lite)
../.venv-arm64/bin/python - <<'PY'
import time, numpy as np
import tensorflow as tf
interpreter = tf.lite.Interpreter(model_path='test_models/synthetic_text_classifier.tflite')
interpreter.allocate_tensors()
inp = interpreter.get_input_details()[0]
x = np.random.randint(0, 32000, size=inp['shape'], dtype=inp['dtype'])
for _ in range(20):
  interpreter.set_tensor(inp['index'], x)
  interpreter.invoke()
lat = []
for _ in range(200):
  t = time.perf_counter()
  interpreter.set_tensor(inp['index'], x)
  interpreter.invoke()
  lat.append((time.perf_counter() - t) * 1000)
print('mean_ms', float(np.mean(lat)))
print('p95_ms', float(np.percentile(lat, 95)))
print('p99_ms', float(np.percentile(lat, 99)))
print('fps', float(1000 / np.mean(lat)))
PY

# 2) Hardware without sloth
../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --use-low-level --iterations 200 --warmup 20

# 3) Hardware with sloth
../.venv-arm64/bin/python examples/benchmark_coral.py --model test_models/synthetic_text_classifier.tflite --iterations 200 --warmup 20
  1. Run all examples with synthetic models:
../.venv-arm64/bin/python examples/classify_on_coral.py --model test_models/synthetic_text_classifier.tflite --text "A quick test sentence" --labels negative,positive
../.venv-arm64/bin/python examples/embed_on_coral.py --model test_models/synthetic_text_embedder.tflite --text "A quick test sentence"
../.venv-arm64/bin/python examples/hybrid_inference.py --checkpoint test_models --coral-model test_models/synthetic_text_classifier.tflite --text "A quick test sentence"
../.venv-arm64/bin/python examples/finetune_and_deploy.py --skip-finetune --checkpoint test_models --skip-compile --compiled-model test_models/synthetic_text_classifier.tflite --test-texts "A quick test sentence"