Live Weight Streaming

rex-serve exposes a WebSocket streaming endpoint at /ws/stream. Use the streaming backend explicitly when you want the runtime and prefetcher to share one live socket instance.

Why it matters

  • The runtime can keep live fetches on the same backend as the executor path.
  • Prefetching can reuse the same streaming connection instead of opening a new request per chunk.
  • Local serving can support chunk delivery even when a manifest is not present.

Start the server

rex-serve --dir ./rex_output/weights --port 8080

The streaming endpoint is available at:

ws://localhost:8080/ws/stream

Runtime behavior

When config.storage.backend_type is STREAMING_WS, Rex connects the streaming backend during model load and shares it with the prefetcher. That keeps live fetches and queued prefetches on the same transport. If the manifest itself advertises storage_backend: streaming_ws, Rex can select this backend automatically during model load.

Protocol

Client request payload:

{
  "manifest_id": "my-model",
  "chunks": [
    {
      "chunk_id": "layer_0.weight.0",
      "resource_id": "weights/layer_0.weight.0.bin"
    }
  ]
}

The manifest should also expose the chunk host in provenance.base_url so the runtime can resolve the WebSocket endpoint without extra per-run wiring.

Binary frames use this layout:

[1B version] [2B id_len] [8B data_len] [id_len B chunk_id] [data_len B data]

The server may send keep-alive frames with id_len=0 and data_len=0.

Useful defaults

  • rex-serve publishes HTTP range responses and streaming together.
  • RexConfig.storage.base_url should point at the chunk host.
  • RexConfig.storage.backend_type selects HTTP_RANGE or STREAMING_WS.
  • REX_ENABLE_STREAMING remains a compatibility flag, but backend selection now comes from backend_type.