Live Weight Streaming¶
rex-serve exposes a WebSocket streaming endpoint at /ws/stream.
Use the streaming backend explicitly when you want the runtime and prefetcher
to share one live socket instance.
Why it matters¶
- The runtime can keep live fetches on the same backend as the executor path.
- Prefetching can reuse the same streaming connection instead of opening a new request per chunk.
- Local serving can support chunk delivery even when a manifest is not present.
Start the server¶
rex-serve --dir ./rex_output/weights --port 8080
The streaming endpoint is available at:
ws://localhost:8080/ws/stream
Runtime behavior¶
When config.storage.backend_type is STREAMING_WS, Rex connects the
streaming backend during model load and shares it with the prefetcher. That
keeps live fetches and queued prefetches on the same transport.
If the manifest itself advertises storage_backend: streaming_ws, Rex can
select this backend automatically during model load.
Protocol¶
Client request payload:
{
"manifest_id": "my-model",
"chunks": [
{
"chunk_id": "layer_0.weight.0",
"resource_id": "weights/layer_0.weight.0.bin"
}
]
}
The manifest should also expose the chunk host in provenance.base_url so the
runtime can resolve the WebSocket endpoint without extra per-run wiring.
Binary frames use this layout:
[1B version] [2B id_len] [8B data_len] [id_len B chunk_id] [data_len B data]
The server may send keep-alive frames with id_len=0 and data_len=0.
Useful defaults¶
rex-servepublishes HTTP range responses and streaming together.RexConfig.storage.base_urlshould point at the chunk host.RexConfig.storage.backend_typeselectsHTTP_RANGEorSTREAMING_WS.REX_ENABLE_STREAMINGremains a compatibility flag, but backend selection now comes frombackend_type.