Architecture¶
This document describes the internal architecture of edgecompiler, including its
data flow, intermediate representation, compilation passes, and extension points.
Table of Contents¶
- Compiler Design and Data Flow
- Unified Intermediate Representation (IR)
- Front-End Converter Pipeline
- Quantisation Pipeline
- Back-End Code Generation
- Runtime Architecture
- Extension Points
Compiler Design and Data Flow¶
edgecompiler follows a classic three-phase compiler architecture: front-end →
middle-end (IR + passes) → back-end. This separation allows any supported model
format to be compiled for any supported target hardware.
┌────────────────────────────────────────────────────────────────────┐
│ edgecompiler Pipeline │
│ │
│ ┌─────────-┐ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │
│ │ Frontend │───▶│ Unified │───▶│ Optimise │───▶│ Quantise │ │
│ │ Convert │ │ IR │ │ Passes │ │ Pipeline │ │
│ └─────────-┘ └──────────┘ └───────────┘ └─────┬───────-┘ │
│ │ │
│ ┌──────────────┼──────--┐ │
│ │ │ │ │
│ ┌─────▼─────┐ ┌─────▼─────┐ │ │
│ │ Coral │ │ Metal │ │ │
│ │ Backend │ │ Backend │ │ │
│ └─────┬─────┘ └─────┬─────┘ │ │
│ │ │ │ │
│ ┌─────▼─────┐ ┌─────▼─────┐ │ │
│ │ .tflite │ │ .mlpackage│ │ │
│ │ (Edge TPU)│ │ (Core ML) │ │ │
│ └───────────┘ └───────────┘ │ │
│ │ │ │ │
│ ┌─────▼─────┐ ┌─────▼─────┐ │ │
│ │ Coral │ │ Metal │ │ │
│ │ Runtime │ │ Runtime │ │ │
│ └───────────┘ └───────────┘ │ │
└────────────────────────────────────────────────────────────────────┘
Compilation Phases¶
| Phase | Description | Input | Output |
|---|---|---|---|
| 1. Front-end conversion | Parse model format → IR | .pt, .tflite, .onnx, SavedModel |
IRGraph |
| 2. Optimisation passes | Constant folding, fusion, DCE | IRGraph |
IRGraph (optimised) |
| 3. Quantisation | INT8 PTQ, QAT, or dynamic range | IRGraph + optional calibration |
IRGraph (quantised) |
| 4. Backend code generation | Lower IR → target binary | IRGraph + target config |
.tflite / .mlpackage |
| 5. Runtime loading | Load and execute compiled model | Target binary | Inference results |
Detailed Data Flow¶
Model File Frontend Converter Unified IR
┌──────────┐ ┌──────────────────┐ ┌──────────────┐
│ model.pt │───────────▶│ PyTorchFrontend │───────▶│ │
└──────────┘ │ .trace() │ │ │
┌──────────┐ │ .convert() │ │ IRGraph │
│model.tflite│─────────▶│ TFLiteFrontend │───────▶│ .ops[] │
└──────────┘ │ .parse() │ │ .tensors[] │
┌──────────┐ │ .convert() │ │ .inputs[] │
│model.onnx │──────────▶│ ONNXFrontend │───────▶│ .outputs[] │
└──────────┘ │ .parse() │ │ │
┌──────────┐ │ .convert() │ │ IRTensor │
│saved_model│──────────▶│ TFSavedModelFE │───────▶│ .name │
└──────────┘ │ .parse() │ │ .dtype │
│ .convert() │ │ .shape │
└──────────────────┘ │ .quant │
└──────┬───────┘
│
┌────────────▼────────────┐
│ Optimisation Passes │
│ │
│ 1. ConstantFoldingPass │
│ 2. OpFusionPass │
│ (Conv+BN+ReLU) │
│ 3. DeadCodeElimPass │
│ 4. LayoutTransformPass │
│ (NCHW → NHWC) │
│ 5. CommonSubexprPass │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Quantisation Pipeline │
│ │
│ 1. Analyse graph dtype │
│ 2. Insert quant/dequant│
│ 3. Calibrate (PTQ) │
│ or apply QAT params │
│ 4. Propagate scales │
│ 5. Verify integrity │
└────────────┬────────────┘
│
┌───────────────────┼─────────────────--──┐
│ │
┌─────────▼────────---┐ ┌────────────▼──────────┐
│ Coral Backend │ │ Metal Backend │
│ │ │ │
│ 1. Legalise ops │ │ 1. Legalise ops │
│ 2. Partition graph │ │ 2. Assign compute │
│ (TPU vs CPU) │ │ units (ANE/GPU) │
│ 3. Build TFLite FB │ │ 3. Build Core ML │
│ 4. Embed custom ops│ │ model spec │
│ 5. Serialise params│ │ 4. Configure MPS │
│ 6. Write .tflite │ │ 5. Write .mlpackage │
└─────────┬──────────-┘ └────────────┬──────────┘
│ │
┌─────────▼──────────┐ ┌────────────▼──────────┐
│ model_coral.tflite │ │ model_ml.mlpackage │
└────────────────────┘ └───────────────────────┘
Unified Intermediate Representation (IR)¶
The unified IR is the central data structure of edgecompiler. Every frontend
converts to this IR, and every backend consumes it. This decoupling is what allows
any model format to target any hardware.
IRGraph¶
class IRGraph:
"""Top-level container for a compiled model graph."""
name: str # Model name
version: str # IR version (e.g., "1.0")
ops: list[IROperation] # Ordered list of operations
tensors: dict[str, IRTensor] # Tensor name → tensor definition
inputs: list[IRTensor] # Model input tensors
outputs: list[IRTensor] # Model output tensors
metadata: dict[str, Any] # Arbitrary metadata (source format, etc.)
def add_op(self, op: IROperation) -> None: ...
def get_tensor(self, name: str) -> IRTensor: ...
def topological_sort(self) -> None: ...
def validate(self) -> list[str]: ... # Returns list of issues
IROperation¶
class IROperation:
"""A single operation in the graph."""
op_type: str # E.g., "Conv2D", "MatMul", "Add"
inputs: list[str] # Input tensor names
outputs: list[str] # Output tensor names
attributes: dict[str, Any] # Op-specific attributes
quantization: OpQuantization | None # Per-op quantisation info
@property
def is_quantized(self) -> bool: ...
IRTensor¶
class IRTensor:
"""A tensor in the graph (activations, weights, or constants)."""
name: str # Unique tensor name
dtype: IRDType # Data type
shape: list[int | None] # Shape (None = dynamic dim)
data: numpy.ndarray | None # Constant data (None for activations)
quantization: TensorQuantization | None # Scale + zero_point
class IRDType(enum.Enum):
FLOAT32 = "float32"
FLOAT16 = "float16"
INT32 = "int32"
INT8 = "int8"
UINT8 = "uint8"
BOOL = "bool"
Quantisation Metadata¶
class TensorQuantization:
"""Quantisation parameters for a tensor."""
scale: float # Quantisation scale factor
zero_point: int # Zero point offset
quantized_dtype: IRDType # Target quantised dtype (INT8/UINT8)
axis: int | None # Per-channel axis (None = per-tensor)
scales: list[float] | None # Per-channel scales
zero_points: list[int] | None # Per-channel zero points
class OpQuantization:
"""Quantisation configuration for an operation."""
input_scales: list[float]
input_zero_points: list[int]
output_scale: float
output_zero_point: int
Design Principles¶
-
Format-agnostic: The IR does not encode any format-specific concepts (e.g., TFLite operator codes or Core ML layer types). These are injected during backend code generation.
-
Explicit quantisation: Quantisation is represented as metadata attached to tensors and operations, not as separate quantise/dequantise ops (though these can be materialised when needed).
-
SSA-like naming: Every tensor has a unique name. Operations produce new tensors rather than mutating existing ones. This simplifies analysis and transformation.
-
Mutable during compilation, immutable after: The IR is freely modified during the optimisation and quantisation passes. Once it reaches the backend, it should be treated as frozen.
-
Metadata pass-through: Front-ends can attach arbitrary metadata (e.g., original op names, training configuration) that backends may optionally consume.
Front-End Converter Pipeline¶
Each front-end follows a two-phase process: parse the source format into a format-specific representation, then convert it to the unified IR.
PyTorch Frontend¶
model.pt / model.pth
│
▼
┌─────────────────┐
│ torch.jit.trace │─── Trace model with example input
│ or .script() │─── Fall back to script if trace fails
└────────┬────────┘
│
▼
┌────────────────-─┐
│ TorchScript Graph│─── Extract nodes and tensors
└────────┬────────-┘
│
▼
┌─────────────────-┐
│ Map PyTorch ops │─── aten::conv2d → Conv2D, etc.
│ to IR op types │─── Handle prim:: ops (Tuple, List)
└────────┬───────-─┘─── Decompose complex ops
│
▼
┌────────────────-─┐
│ Convert tensors │─── Map torch.dtype → IRDType
│ and attributes │─── Convert parameter shapes
└────────┬────────-┘─── Handle padding, stride, dilation
│
▼
IRGraph
Key considerations:
- Models are traced with
torch.jit.trace()using a dummy input matching the declared input shape. If tracing fails (dynamic control flow),torch.jit.script()is used. aten::operations are mapped to IR operations via a lookup table. Unmapped ops raise aUnsupportedOpErrorwith suggestions for workarounds.- PyTorch uses NCHW layout by default. A
LayoutTransformPassin the optimisation pipeline converts to NHWC for TFLite/Coral compatibility. - Quantisation-aware training (QAT) parameters embedded in
torch.quantizationobservers are extracted and propagated to the IR.
TFLite Frontend¶
model.tflite
│
▼
┌────────────────-─┐
│ FlatBuffer parse │─── Parse .tflite binary
└────────┬───────-─┘
│
▼
┌────────────────-─┐
│ Map TFLite ops │─── tflite::OperatorType → IR op type
└────────┬───────-─┘─── Handle custom ops (delegates)
│
▼
┌────────────────-─┐
│ Extract tensors │─── Read tensor shapes, types, data
└────────┬────────-┘─── Handle quantisation parameters
│
▼
┌─────────────────-┐
│ Reconstruct graph│─── Build op dependency ordering
└────────┬───────-─┘─── Handle subgraphs (if present)
│
▼
IRGraph
Key considerations:
- TFLite models may already be quantised. The front-end preserves quantisation parameters and marks the IR as pre-quantised, skipping the quantisation pipeline.
- Custom ops (including Edge TPU delegate ops) are mapped to their corresponding IR operations when possible, or flagged as unknown.
- The TFLite format uses NHWC layout natively, so no layout transformation is needed.
ONNX Frontend¶
model.onnx
│
▼
┌────────────────--┐
│ onnx.load() │─── Load ONNX protobuf
└────────┬───────-─┘
│
▼
┌────────────────-─┐
│ onnxsim.optimize │─── Simplify graph (optional)
└────────┬────────-┘
│
▼
┌─────────────────-┐
│ Map ONNX ops │─── onnx::Conv → Conv2D, etc.
└────────┬───────-─┘─── Handle opset version differences
│
▼
┌────────────────-─┐
│ Convert tensors │─── Map onnx.TensorProto → IRTensor
└────────┬────────-┘─── Handle external data (large models)
│
▼
IRGraph
Key considerations:
- ONNX opset versions 11–20 are supported. Older opsets are auto-upgraded when
possible using
onnx.version_converter. onnxsimis used to simplify the graph before conversion, eliminating identity ops, constant propagation, and other redundancies.- External data files (for models > 2 GB) are supported via ONNX's external data convention.
TensorFlow SavedModel / Keras Frontend¶
saved_model/ or model.h5 / model.keras
│
▼
┌─────────────────┐
│ TF loader │─── Load SavedModel or Keras model
└────────┬────────┘─── Extract signature and concrete functions
│
▼
┌────────────────------------─┐
│ Convert to ConcreteFunction │
└────────┬──────------------──┘─── Get GraphDef
│
▼
┌─────────────────┐
│ Map TF ops │─── tf.nn.conv2d → Conv2D, etc.
└────────┬────────┘─── Handle tf.function tracing
│
▼
┌─────────────────-┐
│ Extract variables│─── Read weights, biases, batch norm params
└────────┬───────--┘─── Convert tf.Variable → constant tensors
│
▼
IRGraph
Key considerations:
- The SavedModel is loaded using
tf.saved_model.load(), and its concrete functions are extracted. The first serving signature is used by default. - Keras models (
.h5/.keras) are loaded viatf.keras.models.load_model()and converted to a concrete function before processing. - TensorFlow resource variables (e.g., in LSTM/GRU layers) are materialised as constant tensors during conversion.
Quantisation Pipeline¶
The quantisation pipeline converts a floating-point IR graph to INT8 (or UINT8) for efficient inference on edge hardware.
Pipeline Stages¶
Float IRGraph
│
▼
┌─────────────────-─┐
│ 1. Analyse graph │─── Identify float tensors and ops
└────────┬────────-─┘─── Determine quantisable vs non-quantisable
│
▼
┌─────────────────--─┐
│ 2. Choose strategy │
│ │─── PTQ: Use calibration data
│ │─── QAT: Use embedded observer params
│ │─── Dynamic: Compute ranges at runtime
└────────┬─────────--┘
│
▼
┌──────────────────┐
│ 3. Insert q/dq │─── Add Quantize/Dequantize at boundaries
│ nodes │─── Between quantisable and non-quantisable ops
└────────┬─────────┘─── Handle mixed-precision boundaries
│
▼
┌──────────────────--┐
│ 4. Calibrate │─── PTQ: Run calibration data through graph
│ │─── Collect min/max per tensor
│ │─── Compute scale = (max - min) / 255
│ │─── Compute zero_point = round(-min / scale)
│ │─── QAT: Extract scales from observers
│ │─── Dynamic: Mark as dynamic (no calibration)
└────────┬─────────--┘
│
▼
┌─────────────────-─┐
│ 5. Propagate │─── Ensure output scale of op A matches
│ scales │─── input scale of op B (when both quantised)
│ │─── Insert Requantize ops where needed
│ │─── Fuse quantised BatchNorm into Conv
└────────┬───────--─┘
│
▼
┌──────────────────┐
│ 6. Verify │─── Check all quantised ops have valid params
│ │─── Validate scale > 0, zero_point in range
│ │─── Warn about large dynamic range tensors
│ │─── Report quantisation error estimates
└────────┬─────────┘
│
▼
Quantised IRGraph
Post-Training Quantisation (PTQ)¶
PTQ is the simplest quantisation method and works well for most models:
from edgecompiler import compile
result = compile(
"mobilenet_v2.pt",
target="coral",
quantize="ptq",
calibration_data="calib.npy", # Shape: (N, C, H, W), float32
num_calibration_samples=100, # Use first 100 samples
)
How it works:
- The calibration dataset is passed through the model in float32.
- Min/max statistics are collected for each activation tensor.
- Scales and zero points are computed from these statistics.
- Weight tensors are quantised using their own min/max (per-channel for depthwise convolutions, per-tensor for everything else).
- The quantised model is verified for numerical consistency.
Best practices:
- Use 100–500 representative calibration samples.
- Ensure calibration data matches the expected input distribution.
- Avoid outlier samples that skew the min/max ranges.
- Consider using
--calibration-percentile 99.9to reduce outlier impact.
Quantisation-Aware Training (QAT)¶
QAT produces the most accurate quantised models by simulating quantisation during training:
result = compile(
"mobilenet_v2_qat.pt", # Model with embedded QAT observers
target="coral",
quantize="qat",
)
How it works:
- The front-end detects
torch.quantizationobservers in the model. - Observer statistics (scale, zero_point) are extracted from the model state dict.
- These parameters are directly applied to the IR without additional calibration.
- Fake-quantise ops are removed, and real quantise/dequantise boundaries are established.
Dynamic Range Quantisation¶
Dynamic range quantisation computes scales at runtime based on each input:
How it works:
- Weight tensors are quantised statically (per-channel or per-tensor).
- Activation scales are computed at runtime from the input min/max.
- This avoids the need for calibration data but may be slower due to runtime overhead.
When to use:
- When you don't have representative calibration data.
- For models with highly variable input distributions.
- When inference speed is less critical than ease of deployment.
Back-End Code Generation¶
Each backend takes a quantised (or float) IR graph and produces a target-specific binary.
Common Backend Interface¶
class Backend(Protocol):
"""Interface that all backends must implement."""
@property
def name(self) -> str: ...
def supports_op(self, op: IROperation) -> bool: ...
def legalise_ops(self, graph: IRGraph) -> IRGraph:
"""Transform IR ops into backend-supported forms."""
...
def compile(self, graph: IRGraph, config: CompileConfig) -> CompileResult:
"""Generate target binary from IR graph."""
...
def estimate_performance(self, graph: IRGraph) -> PerfEstimate:
"""Estimate latency/throughput without hardware."""
...
Backend Selection¶
CompileConfig.target
│
├── "coral" ─────▶ CoralBackend
│ ├─ Legalise ops to TFLite set
│ ├─ Partition for Edge TPU
│ └─ Generate .tflite FlatBuffer
│
├── "metal" ─────▶ MetalBackend
│ ├─ Legalise ops to Core ML set
│ ├─ Assign compute units
│ └─ Generate .mlpackage
│
└── "auto" ──────▶ Select based on available hardware
├─ If Coral USB detected → coral
└─ If Apple Silicon detected → metal
See Coral Backend and Metal Backend for detailed documentation on each backend.
Runtime Architecture¶
edgecompiler includes lightweight runtime wrappers for both backends, providing a
uniform inference API.
┌──────────────────────────────────────────────┐
│ edgecompiler.runtime │
│ │
│ ┌─────────────────────────────────────────┐ │
│ │ InferenceSession │ │
│ │ │ │
│ │ .load(path, target="coral"|"metal") │ │
│ │ .run(inputs: dict) → dict │ │
│ │ .benchmark(iterations) → Stats │ │
│ │ .close() │ │
│ └──────────────┬──────────────────────────┘ │
│ │ │
│ ┌───────────┼──────────┐ │
│ │ │ │
│ ┌──▼──────────────┐ ┌────▼───────────────┐ │
│ │ CoralRuntime │ │ MetalRuntime │ │
│ │ │ │ │ │
│ │ Uses: │ │ Uses: │ │
│ │ libedgetpu │ │ coremltools │ │
│ │ tflite_runtime │ │ Core ML framework │ │
│ │ pycoral │ │ │ │
│ └─────────────────┘ └────────────────────┘ │
└──────────────────────────────────────────────┘
InferenceSession API¶
from edgecompiler.runtime import InferenceSession
# Load a compiled model
session = InferenceSession("model_coral.tflite", target="coral")
# Run inference
result = session.run({"input": input_array})
# Benchmark
stats = session.benchmark(iterations=100)
print(f"Mean latency: {stats.mean_ms:.2f} ms")
print(f"P95 latency: {stats.p95_ms:.2f} ms")
print(f"Throughput: {stats.throughput_fps:.1f} FPS")
# Clean up
session.close()
Extension Points¶
edgecompiler is designed to be extensible. The following extension points are
supported:
Adding a New Frontend¶
A frontend converts a model format to the unified IR.
Step 1: Implement the frontend class
# src/edgecompiler/frontend/my_frontend.py
from edgecompiler.ir import IRGraph, IROperation, IRTensor, IRDType
from edgecompiler.frontend.base import Frontend
class MyFrontend(Frontend):
"""Frontend for MyModel format."""
supported_extensions = (".mymodel",)
def convert(self, model_path: str, **kwargs) -> IRGraph:
"""Convert a MyModel file to IRGraph."""
graph = IRGraph(name="imported_model")
# 1. Parse the model file
raw_model = self._parse(model_path)
# 2. Map operations to IR
for op in raw_model.operations:
ir_op = self._convert_op(op)
graph.add_op(ir_op)
# 3. Set inputs/outputs
graph.inputs = [self._convert_tensor(t) for t in raw_model.inputs]
graph.outputs = [self._convert_tensor(t) for t in raw_model.outputs]
return graph
def _parse(self, path: str):
"""Parse the model file format."""
...
def _convert_op(self, op) -> IROperation:
"""Map a model op to an IR operation."""
...
def _convert_tensor(self, tensor) -> IRTensor:
"""Map a model tensor to an IR tensor."""
...
Step 2: Register the frontend
# src/edgecompiler/frontend/registry.py
from edgecompiler.frontend.my_frontend import MyFrontend
def get_frontend(model_path: str) -> Frontend:
if model_path.endswith(".mymodel"):
return MyFrontend()
...
Step 3: Add tests
# tests/test_my_frontend.py
def test_convert_basic_model():
frontend = MyFrontend()
graph = frontend.convert("tests/fixtures/basic.mymodel")
assert len(graph.ops) > 0
assert len(graph.inputs) > 0
assert len(graph.outputs) > 0
Adding a New Backend¶
A backend compiles the unified IR to a target binary format.
Step 1: Implement the backend class
# src/edgecompiler/backend/my_backend.py
from edgecompiler.ir import IRGraph
from edgecompiler.backend.base import Backend, CompileResult, CompileConfig
class MyBackend(Backend):
"""Backend for MyHardware accelerator."""
@property
def name(self) -> str:
return "my_hardware"
def supports_op(self, op: IROperation) -> bool:
SUPPORTED_OPS = {"Conv2D", "MatMul", "Add", "ReLU", ...}
return op.op_type in SUPPORTED_OPS
def legalise_ops(self, graph: IRGraph) -> IRGraph:
"""Transform ops that aren't natively supported."""
# E.g., replace ReLU6 with clip(0, 6)
...
return graph
def compile(self, graph: IRGraph, config: CompileConfig) -> CompileResult:
"""Generate hardware binary from IR."""
# 1. Legalise the graph
graph = self.legalise_ops(graph)
# 2. Partition (supported vs unsupported ops)
supported, fallback = self._partition(graph)
# 3. Generate target binary
binary = self._generate_binary(supported)
# 4. Write output file
output_path = config.output_path or "model.mybin"
with open(output_path, "wb") as f:
f.write(binary)
return CompileResult(
output_path=output_path,
ops_on_target=len(supported),
ops_fallback=len(fallback),
backend=self.name,
)
def _partition(self, graph: IRGraph):
"""Split graph into supported and unsupported ops."""
...
def _generate_binary(self, graph: IRGraph) -> bytes:
"""Generate the target binary format."""
...
Step 2: Register the backend
# src/edgecompiler/backend/registry.py
from edgecompiler.backend.my_backend import MyBackend
def get_backend(target: str) -> Backend:
if target == "my_hardware":
return MyBackend()
...
Step 3: Add CLI support
The CLI automatically supports any registered backend:
Adding a New Optimisation Pass¶
Optimisation passes transform the IR graph before quantisation.
# src/edgecompiler/passes/my_pass.py
from edgecompiler.ir import IRGraph
from edgecompiler.passes.base import Pass
class MyOptimisationPass(Pass):
"""Example optimisation pass."""
name = "my_optimisation"
def run(self, graph: IRGraph) -> IRGraph:
"""Apply the optimisation to the graph."""
modified = False
for op in graph.ops:
if self._matches_pattern(op):
self._transform(op, graph)
modified = True
if modified:
graph.validate()
return graph
def _matches_pattern(self, op: IROperation) -> bool:
...
def _transform(self, op: IROperation, graph: IRGraph) -> None:
...
Register in the pass pipeline:
# src/edgecompiler/passes/pipeline.py
from edgecompiler.passes.my_pass import MyOptimisationPass
DEFAULT_PASSES = [
ConstantFoldingPass(),
OpFusionPass(),
MyOptimisationPass(), # Add here
DeadCodeElimPass(),
LayoutTransformPass(),
]
Adding a New Quantisation Strategy¶
# src/edgecompiler/quantize/my_strategy.py
from edgecompiler.ir import IRGraph
from edgecompiler.quantize.base import QuantizationStrategy
class MyQuantStrategy(QuantizationStrategy):
"""Custom quantisation strategy."""
name = "my_strategy"
def quantize(self, graph: IRGraph, **kwargs) -> IRGraph:
"""Apply quantisation to the graph."""
# 1. Identify quantisable ops
# 2. Compute quantisation parameters
# 3. Update tensors and ops with quant metadata
# 4. Insert requantize ops where needed
return graph
Register:
# src/edgecompiler/quantize/registry.py
STRATEGIES = {
"ptq": PTQStrategy(),
"qat": QATStrategy(),
"dynamic": DynamicRangeStrategy(),
"my_strategy": MyQuantStrategy(),
}
Error Handling¶
The compiler produces structured errors to help users diagnose issues:
class EdgeCompilerError(Exception):
"""Base exception for edgecompiler."""
class UnsupportedOpError(EdgeCompilerError):
"""Raised when an operation is not supported by the target backend."""
op_type: str
backend: str
suggestion: str | None
class QuantizationError(EdgeCompilerError):
"""Raised when quantisation fails or produces invalid results."""
tensor_name: str
reason: str
class FrontendError(EdgeCompilerError):
"""Raised when a model cannot be parsed by any frontend."""
model_path: str
format_hint: str | None
class BackendError(EdgeCompilerError):
"""Raised when backend code generation fails."""
target: str
reason: str
All errors include actionable messages with suggestions for resolution. When running
with --verbose, full stack traces and intermediate IR dumps are provided.