Skip to content

Architecture

This document describes the internal architecture of edgecompiler, including its data flow, intermediate representation, compilation passes, and extension points.


Table of Contents

  1. Compiler Design and Data Flow
  2. Unified Intermediate Representation (IR)
  3. Front-End Converter Pipeline
  4. Quantisation Pipeline
  5. Back-End Code Generation
  6. Runtime Architecture
  7. Extension Points

Compiler Design and Data Flow

edgecompiler follows a classic three-phase compiler architecture: front-end → middle-end (IR + passes) → back-end. This separation allows any supported model format to be compiled for any supported target hardware.

┌────────────────────────────────────────────────────────────────────┐
│                        edgecompiler Pipeline                       │
│                                                                    │
│  ┌─────────-┐    ┌──────────┐    ┌───────────┐    ┌──────────────┐ │
│  │ Frontend │───▶│  Unified │───▶│ Optimise  │───▶│  Quantise    │ │
│  │ Convert  │    │    IR    │    │  Passes   │    │  Pipeline    │ │
│  └─────────-┘    └──────────┘    └───────────┘    └─────┬───────-┘ │
│                                                         │          │
│                                          ┌──────────────┼──────--┐ │
│                                          │              │        │ │
│                                    ┌─────▼─────┐  ┌─────▼─────┐  │ │
│                                    │   Coral   │  │   Metal   │  │ │
│                                    │  Backend  │  │  Backend  │  │ │
│                                    └─────┬─────┘  └─────┬─────┘  │ │
│                                          │              │        │ │
│                                    ┌─────▼─────┐  ┌─────▼─────┐  │ │
│                                    │  .tflite  │  │ .mlpackage│  │ │
│                                    │ (Edge TPU)│  │ (Core ML) │  │ │
│                                    └───────────┘  └───────────┘  │ │
│                                          │              │        │ │
│                                    ┌─────▼─────┐  ┌─────▼─────┐  │ │
│                                    │  Coral    │  │   Metal   │  │ │
│                                    │ Runtime   │  │  Runtime  │  │ │
│                                    └───────────┘  └───────────┘  │ │
└────────────────────────────────────────────────────────────────────┘

Compilation Phases

Phase Description Input Output
1. Front-end conversion Parse model format → IR .pt, .tflite, .onnx, SavedModel IRGraph
2. Optimisation passes Constant folding, fusion, DCE IRGraph IRGraph (optimised)
3. Quantisation INT8 PTQ, QAT, or dynamic range IRGraph + optional calibration IRGraph (quantised)
4. Backend code generation Lower IR → target binary IRGraph + target config .tflite / .mlpackage
5. Runtime loading Load and execute compiled model Target binary Inference results

Detailed Data Flow

  Model File               Frontend Converter            Unified IR
 ┌──────────┐            ┌──────────────────┐        ┌──────────────┐
 │ model.pt │───────────▶│ PyTorchFrontend  │───────▶│              │
 └──────────┘            │  .trace()        │        │              │
 ┌──────────┐            │  .convert()      │        │   IRGraph    │
 │model.tflite│─────────▶│ TFLiteFrontend   │───────▶│  .ops[]      │
 └──────────┘            │  .parse()        │        │  .tensors[]  │
 ┌──────────┐            │  .convert()      │        │  .inputs[]   │
 │model.onnx │──────────▶│ ONNXFrontend     │───────▶│  .outputs[]  │
 └──────────┘            │  .parse()        │        │              │
 ┌──────────┐            │  .convert()      │        │  IRTensor    │
 │saved_model│──────────▶│ TFSavedModelFE   │───────▶│  .name       │
 └──────────┘            │  .parse()        │        │  .dtype      │
                         │  .convert()      │        │  .shape      │
                         └──────────────────┘        │  .quant      │
                                                     └──────┬───────┘
                                               ┌────────────▼────────────┐
                                               │    Optimisation Passes  │
                                               │                         │
                                               │  1. ConstantFoldingPass │
                                               │  2. OpFusionPass        │
                                               │     (Conv+BN+ReLU)      │
                                               │  3. DeadCodeElimPass    │
                                               │  4. LayoutTransformPass │
                                               │     (NCHW → NHWC)       │
                                               │  5. CommonSubexprPass   │
                                               └────────────┬────────────┘
                                               ┌────────────▼────────────┐
                                               │   Quantisation Pipeline │
                                               │                         │
                                               │  1. Analyse graph dtype │
                                               │  2. Insert quant/dequant│
                                               │  3. Calibrate (PTQ)     │
                                               │     or apply QAT params │
                                               │  4. Propagate scales    │
                                               │  5. Verify integrity    │
                                               └────────────┬────────────┘
                                        ┌───────────────────┼─────────────────--──┐
                                        │                                         │
                              ┌─────────▼────────---┐                ┌────────────▼──────────┐
                              │   Coral Backend     │                │   Metal Backend       │
                              │                     │                │                       │
                              │  1. Legalise ops    │                │  1. Legalise ops      │
                              │  2. Partition graph │                │  2. Assign compute    │
                              │     (TPU vs CPU)    │                │     units (ANE/GPU)   │
                              │  3. Build TFLite FB │                │  3. Build Core ML     │
                              │  4. Embed custom ops│                │     model spec        │
                              │  5. Serialise params│                │  4. Configure MPS     │
                              │  6. Write .tflite   │                │  5. Write .mlpackage  │
                              └─────────┬──────────-┘                └────────────┬──────────┘
                                        │                                         │
                              ┌─────────▼──────────┐                 ┌────────────▼──────────┐
                              │ model_coral.tflite │                 │ model_ml.mlpackage    │
                              └────────────────────┘                 └───────────────────────┘

Unified Intermediate Representation (IR)

The unified IR is the central data structure of edgecompiler. Every frontend converts to this IR, and every backend consumes it. This decoupling is what allows any model format to target any hardware.

IRGraph

class IRGraph:
    """Top-level container for a compiled model graph."""

    name: str                        # Model name
    version: str                     # IR version (e.g., "1.0")
    ops: list[IROperation]           # Ordered list of operations
    tensors: dict[str, IRTensor]     # Tensor name → tensor definition
    inputs: list[IRTensor]           # Model input tensors
    outputs: list[IRTensor]          # Model output tensors
    metadata: dict[str, Any]         # Arbitrary metadata (source format, etc.)

    def add_op(self, op: IROperation) -> None: ...
    def get_tensor(self, name: str) -> IRTensor: ...
    def topological_sort(self) -> None: ...
    def validate(self) -> list[str]: ...  # Returns list of issues

IROperation

class IROperation:
    """A single operation in the graph."""

    op_type: str                          # E.g., "Conv2D", "MatMul", "Add"
    inputs: list[str]                     # Input tensor names
    outputs: list[str]                    # Output tensor names
    attributes: dict[str, Any]            # Op-specific attributes
    quantization: OpQuantization | None   # Per-op quantisation info

    @property
    def is_quantized(self) -> bool: ...

IRTensor

class IRTensor:
    """A tensor in the graph (activations, weights, or constants)."""

    name: str                             # Unique tensor name
    dtype: IRDType                        # Data type
    shape: list[int | None]               # Shape (None = dynamic dim)
    data: numpy.ndarray | None            # Constant data (None for activations)
    quantization: TensorQuantization | None  # Scale + zero_point

class IRDType(enum.Enum):
    FLOAT32 = "float32"
    FLOAT16 = "float16"
    INT32   = "int32"
    INT8    = "int8"
    UINT8   = "uint8"
    BOOL    = "bool"

Quantisation Metadata

class TensorQuantization:
    """Quantisation parameters for a tensor."""
    scale: float                # Quantisation scale factor
    zero_point: int             # Zero point offset
    quantized_dtype: IRDType    # Target quantised dtype (INT8/UINT8)
    axis: int | None            # Per-channel axis (None = per-tensor)
    scales: list[float] | None  # Per-channel scales
    zero_points: list[int] | None  # Per-channel zero points

class OpQuantization:
    """Quantisation configuration for an operation."""
    input_scales: list[float]
    input_zero_points: list[int]
    output_scale: float
    output_zero_point: int

Design Principles

  1. Format-agnostic: The IR does not encode any format-specific concepts (e.g., TFLite operator codes or Core ML layer types). These are injected during backend code generation.

  2. Explicit quantisation: Quantisation is represented as metadata attached to tensors and operations, not as separate quantise/dequantise ops (though these can be materialised when needed).

  3. SSA-like naming: Every tensor has a unique name. Operations produce new tensors rather than mutating existing ones. This simplifies analysis and transformation.

  4. Mutable during compilation, immutable after: The IR is freely modified during the optimisation and quantisation passes. Once it reaches the backend, it should be treated as frozen.

  5. Metadata pass-through: Front-ends can attach arbitrary metadata (e.g., original op names, training configuration) that backends may optionally consume.


Front-End Converter Pipeline

Each front-end follows a two-phase process: parse the source format into a format-specific representation, then convert it to the unified IR.

PyTorch Frontend

model.pt / model.pth
  ┌─────────────────┐
  │ torch.jit.trace │─── Trace model with example input
  │ or .script()    │─── Fall back to script if trace fails
  └────────┬────────┘
  ┌────────────────-─┐
  │ TorchScript Graph│─── Extract nodes and tensors
  └────────┬────────-┘
  ┌─────────────────-┐
  │ Map PyTorch ops  │─── aten::conv2d → Conv2D, etc.
  │ to IR op types   │─── Handle prim:: ops (Tuple, List)
  └────────┬───────-─┘─── Decompose complex ops
  ┌────────────────-─┐
  │ Convert tensors  │─── Map torch.dtype → IRDType
  │ and attributes   │─── Convert parameter shapes
  └────────┬────────-┘─── Handle padding, stride, dilation
      IRGraph

Key considerations:

  • Models are traced with torch.jit.trace() using a dummy input matching the declared input shape. If tracing fails (dynamic control flow), torch.jit.script() is used.
  • aten:: operations are mapped to IR operations via a lookup table. Unmapped ops raise a UnsupportedOpError with suggestions for workarounds.
  • PyTorch uses NCHW layout by default. A LayoutTransformPass in the optimisation pipeline converts to NHWC for TFLite/Coral compatibility.
  • Quantisation-aware training (QAT) parameters embedded in torch.quantization observers are extracted and propagated to the IR.

TFLite Frontend

model.tflite
┌────────────────-─┐
│ FlatBuffer parse │─── Parse .tflite binary
└────────┬───────-─┘
┌────────────────-─┐
│ Map TFLite ops   │─── tflite::OperatorType → IR op type
└────────┬───────-─┘─── Handle custom ops (delegates)
┌────────────────-─┐
│ Extract tensors  │─── Read tensor shapes, types, data
└────────┬────────-┘─── Handle quantisation parameters
┌─────────────────-┐
│ Reconstruct graph│─── Build op dependency ordering
└────────┬───────-─┘─── Handle subgraphs (if present)
    IRGraph

Key considerations:

  • TFLite models may already be quantised. The front-end preserves quantisation parameters and marks the IR as pre-quantised, skipping the quantisation pipeline.
  • Custom ops (including Edge TPU delegate ops) are mapped to their corresponding IR operations when possible, or flagged as unknown.
  • The TFLite format uses NHWC layout natively, so no layout transformation is needed.

ONNX Frontend

model.onnx
┌────────────────--┐
│ onnx.load()      │─── Load ONNX protobuf
└────────┬───────-─┘
┌────────────────-─┐
│ onnxsim.optimize │─── Simplify graph (optional)
└────────┬────────-┘
┌─────────────────-┐
│ Map ONNX ops     │─── onnx::Conv → Conv2D, etc.
└────────┬───────-─┘─── Handle opset version differences
┌────────────────-─┐
│ Convert tensors  │─── Map onnx.TensorProto → IRTensor
└────────┬────────-┘─── Handle external data (large models)
    IRGraph

Key considerations:

  • ONNX opset versions 11–20 are supported. Older opsets are auto-upgraded when possible using onnx.version_converter.
  • onnxsim is used to simplify the graph before conversion, eliminating identity ops, constant propagation, and other redundancies.
  • External data files (for models > 2 GB) are supported via ONNX's external data convention.

TensorFlow SavedModel / Keras Frontend

saved_model/ or model.h5 / model.keras
┌─────────────────┐
│ TF loader       │─── Load SavedModel or Keras model
└────────┬────────┘─── Extract signature and concrete functions
┌────────────────------------─┐
│ Convert to ConcreteFunction │
└────────┬──────------------──┘─── Get GraphDef
┌─────────────────┐
│ Map TF ops      │─── tf.nn.conv2d → Conv2D, etc.
└────────┬────────┘─── Handle tf.function tracing
┌─────────────────-┐
│ Extract variables│─── Read weights, biases, batch norm params
└────────┬───────--┘─── Convert tf.Variable → constant tensors
    IRGraph

Key considerations:

  • The SavedModel is loaded using tf.saved_model.load(), and its concrete functions are extracted. The first serving signature is used by default.
  • Keras models (.h5 / .keras) are loaded via tf.keras.models.load_model() and converted to a concrete function before processing.
  • TensorFlow resource variables (e.g., in LSTM/GRU layers) are materialised as constant tensors during conversion.

Quantisation Pipeline

The quantisation pipeline converts a floating-point IR graph to INT8 (or UINT8) for efficient inference on edge hardware.

Pipeline Stages

  Float IRGraph
┌─────────────────-─┐
│ 1. Analyse graph  │─── Identify float tensors and ops
└────────┬────────-─┘─── Determine quantisable vs non-quantisable
┌─────────────────--─┐
│ 2. Choose strategy │
│                    │─── PTQ:  Use calibration data
│                    │─── QAT:  Use embedded observer params
│                    │─── Dynamic: Compute ranges at runtime
└────────┬─────────--┘
┌──────────────────┐
│ 3. Insert q/dq   │─── Add Quantize/Dequantize at boundaries
│    nodes         │─── Between quantisable and non-quantisable ops
└────────┬─────────┘─── Handle mixed-precision boundaries
┌──────────────────--┐
│ 4. Calibrate       │─── PTQ:  Run calibration data through graph
│                    │─── Collect min/max per tensor
│                    │─── Compute scale = (max - min) / 255
│                    │─── Compute zero_point = round(-min / scale)
│                    │─── QAT:  Extract scales from observers
│                    │─── Dynamic: Mark as dynamic (no calibration)
└────────┬─────────--┘
┌─────────────────-─┐
│ 5. Propagate      │─── Ensure output scale of op A matches
│    scales         │─── input scale of op B (when both quantised)
│                   │─── Insert Requantize ops where needed
│                   │─── Fuse quantised BatchNorm into Conv
└────────┬───────--─┘
┌──────────────────┐
│ 6. Verify        │─── Check all quantised ops have valid params
│                  │─── Validate scale > 0, zero_point in range
│                  │─── Warn about large dynamic range tensors
│                  │─── Report quantisation error estimates
└────────┬─────────┘
  Quantised IRGraph

Post-Training Quantisation (PTQ)

PTQ is the simplest quantisation method and works well for most models:

from edgecompiler import compile

result = compile(
    "mobilenet_v2.pt",
    target="coral",
    quantize="ptq",
    calibration_data="calib.npy",  # Shape: (N, C, H, W), float32
    num_calibration_samples=100,   # Use first 100 samples
)

How it works:

  1. The calibration dataset is passed through the model in float32.
  2. Min/max statistics are collected for each activation tensor.
  3. Scales and zero points are computed from these statistics.
  4. Weight tensors are quantised using their own min/max (per-channel for depthwise convolutions, per-tensor for everything else).
  5. The quantised model is verified for numerical consistency.

Best practices:

  • Use 100–500 representative calibration samples.
  • Ensure calibration data matches the expected input distribution.
  • Avoid outlier samples that skew the min/max ranges.
  • Consider using --calibration-percentile 99.9 to reduce outlier impact.

Quantisation-Aware Training (QAT)

QAT produces the most accurate quantised models by simulating quantisation during training:

result = compile(
    "mobilenet_v2_qat.pt",  # Model with embedded QAT observers
    target="coral",
    quantize="qat",
)

How it works:

  1. The front-end detects torch.quantization observers in the model.
  2. Observer statistics (scale, zero_point) are extracted from the model state dict.
  3. These parameters are directly applied to the IR without additional calibration.
  4. Fake-quantise ops are removed, and real quantise/dequantise boundaries are established.

Dynamic Range Quantisation

Dynamic range quantisation computes scales at runtime based on each input:

result = compile(
    "mobilenet_v2.pt",
    target="metal",
    quantize="dynamic",
)

How it works:

  1. Weight tensors are quantised statically (per-channel or per-tensor).
  2. Activation scales are computed at runtime from the input min/max.
  3. This avoids the need for calibration data but may be slower due to runtime overhead.

When to use:

  • When you don't have representative calibration data.
  • For models with highly variable input distributions.
  • When inference speed is less critical than ease of deployment.

Back-End Code Generation

Each backend takes a quantised (or float) IR graph and produces a target-specific binary.

Common Backend Interface

class Backend(Protocol):
    """Interface that all backends must implement."""

    @property
    def name(self) -> str: ...

    def supports_op(self, op: IROperation) -> bool: ...

    def legalise_ops(self, graph: IRGraph) -> IRGraph:
        """Transform IR ops into backend-supported forms."""
        ...

    def compile(self, graph: IRGraph, config: CompileConfig) -> CompileResult:
        """Generate target binary from IR graph."""
        ...

    def estimate_performance(self, graph: IRGraph) -> PerfEstimate:
        """Estimate latency/throughput without hardware."""
        ...

Backend Selection

CompileConfig.target
        ├── "coral" ─────▶ CoralBackend
        │                    ├─ Legalise ops to TFLite set
        │                    ├─ Partition for Edge TPU
        │                    └─ Generate .tflite FlatBuffer
        ├── "metal" ─────▶ MetalBackend
        │                    ├─ Legalise ops to Core ML set
        │                    ├─ Assign compute units
        │                    └─ Generate .mlpackage
        └── "auto" ──────▶ Select based on available hardware
                             ├─ If Coral USB detected → coral
                             └─ If Apple Silicon detected → metal

See Coral Backend and Metal Backend for detailed documentation on each backend.


Runtime Architecture

edgecompiler includes lightweight runtime wrappers for both backends, providing a uniform inference API.

┌──────────────────────────────────────────────┐
│              edgecompiler.runtime            │
│                                              │
│  ┌─────────────────────────────────────────┐ │
│  │         InferenceSession                │ │
│  │                                         │ │
│  │  .load(path, target="coral"|"metal")    │ │
│  │  .run(inputs: dict) → dict              │ │
│  │  .benchmark(iterations) → Stats         │ │
│  │  .close()                               │ │
│  └──────────────┬──────────────────────────┘ │
│                 │                            │
│     ┌───────────┼──────────┐                 │
│     │                      │                 │
│  ┌──▼──────────────┐  ┌────▼───────────────┐ │
│  │ CoralRuntime    │  │ MetalRuntime       │ │
│  │                 │  │                    │ │
│  │ Uses:           │  │ Uses:              │ │
│  │  libedgetpu     │  │  coremltools       │ │
│  │  tflite_runtime │  │  Core ML framework │ │
│  │  pycoral        │  │                    │ │
│  └─────────────────┘  └────────────────────┘ │
└──────────────────────────────────────────────┘

InferenceSession API

from edgecompiler.runtime import InferenceSession

# Load a compiled model
session = InferenceSession("model_coral.tflite", target="coral")

# Run inference
result = session.run({"input": input_array})

# Benchmark
stats = session.benchmark(iterations=100)
print(f"Mean latency: {stats.mean_ms:.2f} ms")
print(f"P95 latency:  {stats.p95_ms:.2f} ms")
print(f"Throughput:    {stats.throughput_fps:.1f} FPS")

# Clean up
session.close()

Extension Points

edgecompiler is designed to be extensible. The following extension points are supported:

Adding a New Frontend

A frontend converts a model format to the unified IR.

Step 1: Implement the frontend class

# src/edgecompiler/frontend/my_frontend.py

from edgecompiler.ir import IRGraph, IROperation, IRTensor, IRDType
from edgecompiler.frontend.base import Frontend

class MyFrontend(Frontend):
    """Frontend for MyModel format."""

    supported_extensions = (".mymodel",)

    def convert(self, model_path: str, **kwargs) -> IRGraph:
        """Convert a MyModel file to IRGraph."""
        graph = IRGraph(name="imported_model")

        # 1. Parse the model file
        raw_model = self._parse(model_path)

        # 2. Map operations to IR
        for op in raw_model.operations:
            ir_op = self._convert_op(op)
            graph.add_op(ir_op)

        # 3. Set inputs/outputs
        graph.inputs = [self._convert_tensor(t) for t in raw_model.inputs]
        graph.outputs = [self._convert_tensor(t) for t in raw_model.outputs]

        return graph

    def _parse(self, path: str):
        """Parse the model file format."""
        ...

    def _convert_op(self, op) -> IROperation:
        """Map a model op to an IR operation."""
        ...

    def _convert_tensor(self, tensor) -> IRTensor:
        """Map a model tensor to an IR tensor."""
        ...

Step 2: Register the frontend

# src/edgecompiler/frontend/registry.py

from edgecompiler.frontend.my_frontend import MyFrontend

def get_frontend(model_path: str) -> Frontend:
    if model_path.endswith(".mymodel"):
        return MyFrontend()
    ...

Step 3: Add tests

# tests/test_my_frontend.py

def test_convert_basic_model():
    frontend = MyFrontend()
    graph = frontend.convert("tests/fixtures/basic.mymodel")
    assert len(graph.ops) > 0
    assert len(graph.inputs) > 0
    assert len(graph.outputs) > 0

Adding a New Backend

A backend compiles the unified IR to a target binary format.

Step 1: Implement the backend class

# src/edgecompiler/backend/my_backend.py

from edgecompiler.ir import IRGraph
from edgecompiler.backend.base import Backend, CompileResult, CompileConfig

class MyBackend(Backend):
    """Backend for MyHardware accelerator."""

    @property
    def name(self) -> str:
        return "my_hardware"

    def supports_op(self, op: IROperation) -> bool:
        SUPPORTED_OPS = {"Conv2D", "MatMul", "Add", "ReLU", ...}
        return op.op_type in SUPPORTED_OPS

    def legalise_ops(self, graph: IRGraph) -> IRGraph:
        """Transform ops that aren't natively supported."""
        # E.g., replace ReLU6 with clip(0, 6)
        ...
        return graph

    def compile(self, graph: IRGraph, config: CompileConfig) -> CompileResult:
        """Generate hardware binary from IR."""
        # 1. Legalise the graph
        graph = self.legalise_ops(graph)

        # 2. Partition (supported vs unsupported ops)
        supported, fallback = self._partition(graph)

        # 3. Generate target binary
        binary = self._generate_binary(supported)

        # 4. Write output file
        output_path = config.output_path or "model.mybin"
        with open(output_path, "wb") as f:
            f.write(binary)

        return CompileResult(
            output_path=output_path,
            ops_on_target=len(supported),
            ops_fallback=len(fallback),
            backend=self.name,
        )

    def _partition(self, graph: IRGraph):
        """Split graph into supported and unsupported ops."""
        ...

    def _generate_binary(self, graph: IRGraph) -> bytes:
        """Generate the target binary format."""
        ...

Step 2: Register the backend

# src/edgecompiler/backend/registry.py

from edgecompiler.backend.my_backend import MyBackend

def get_backend(target: str) -> Backend:
    if target == "my_hardware":
        return MyBackend()
    ...

Step 3: Add CLI support

The CLI automatically supports any registered backend:

edgecompile model.pt --target my_hardware --output model.mybin

Adding a New Optimisation Pass

Optimisation passes transform the IR graph before quantisation.

# src/edgecompiler/passes/my_pass.py

from edgecompiler.ir import IRGraph
from edgecompiler.passes.base import Pass

class MyOptimisationPass(Pass):
    """Example optimisation pass."""

    name = "my_optimisation"

    def run(self, graph: IRGraph) -> IRGraph:
        """Apply the optimisation to the graph."""
        modified = False

        for op in graph.ops:
            if self._matches_pattern(op):
                self._transform(op, graph)
                modified = True

        if modified:
            graph.validate()

        return graph

    def _matches_pattern(self, op: IROperation) -> bool:
        ...

    def _transform(self, op: IROperation, graph: IRGraph) -> None:
        ...

Register in the pass pipeline:

# src/edgecompiler/passes/pipeline.py

from edgecompiler.passes.my_pass import MyOptimisationPass

DEFAULT_PASSES = [
    ConstantFoldingPass(),
    OpFusionPass(),
    MyOptimisationPass(),      # Add here
    DeadCodeElimPass(),
    LayoutTransformPass(),
]

Adding a New Quantisation Strategy

# src/edgecompiler/quantize/my_strategy.py

from edgecompiler.ir import IRGraph
from edgecompiler.quantize.base import QuantizationStrategy

class MyQuantStrategy(QuantizationStrategy):
    """Custom quantisation strategy."""

    name = "my_strategy"

    def quantize(self, graph: IRGraph, **kwargs) -> IRGraph:
        """Apply quantisation to the graph."""
        # 1. Identify quantisable ops
        # 2. Compute quantisation parameters
        # 3. Update tensors and ops with quant metadata
        # 4. Insert requantize ops where needed
        return graph

Register:

# src/edgecompiler/quantize/registry.py

STRATEGIES = {
    "ptq": PTQStrategy(),
    "qat": QATStrategy(),
    "dynamic": DynamicRangeStrategy(),
    "my_strategy": MyQuantStrategy(),
}

Error Handling

The compiler produces structured errors to help users diagnose issues:

class EdgeCompilerError(Exception):
    """Base exception for edgecompiler."""

class UnsupportedOpError(EdgeCompilerError):
    """Raised when an operation is not supported by the target backend."""
    op_type: str
    backend: str
    suggestion: str | None

class QuantizationError(EdgeCompilerError):
    """Raised when quantisation fails or produces invalid results."""
    tensor_name: str
    reason: str

class FrontendError(EdgeCompilerError):
    """Raised when a model cannot be parsed by any frontend."""
    model_path: str
    format_hint: str | None

class BackendError(EdgeCompilerError):
    """Raised when backend code generation fails."""
    target: str
    reason: str

All errors include actionable messages with suggestions for resolution. When running with --verbose, full stack traces and intermediate IR dumps are provided.