Rex Framework Package Guide
This guide is for installed-package usage from PyPI-style environments.
Package intent: - Primary: end-user inference workflows with remotely stored model weights. - Secondary: validation-oriented use in CI or application test environments.
1. Installation
Minimal package:
pip install rex-framework
Recommended for real inference workloads:
pip install "rex-framework[pytorch]"
Install notes:
- numpy is installed automatically with the package.
- The pytorch extra supports Python 3.10 to 3.13.
- PyTorch wheels are not available on PyPI for Python 3.14.
- On some platforms (such as macOS x86_64), Python 3.13 may not have a compatible torch wheel yet. Use Python 3.11 if that happens.
Quick verification:
python -c "import torch, rex; print(torch.__version__); print(rex.__version__)"
Optional extras:
pip install "rex-framework[google-drive]"
pip install "rex-framework[onedrive]"
pip install "rex-framework[bench]"
pip install "rex-framework[all]"
2. End-to-End CLI Flow
2.1 Convert a checkpoint to Rex format
rex-convert model.pt -o ./rex_output --framework pytorch --model-id my-model
2.2 Serve chunk files with HTTP range support
rex-serve --dir ./rex_output/weights --port 8080
2.3 Validate and inspect manifest
rex-validate ./rex_output/manifest.json
rex-inspect ./rex_output/manifest.json --verbose
2.4 Run benchmark
rex-benchmark \
--manifest ./rex_output/manifest.json \
--base-url http://localhost:8080 \
--inferences 10 \
--warmup 2 \
--output benchmark.json
3. Python API Usage
3.1 Basic load + inference
import numpy as np
from rex.api.load import load_model
from rex.api.generate import run_inference_sync
manifest_path = "./rex_output/manifest.json"
runtime = load_model(manifest_path)
input_data = np.random.randn(1, 768).astype(np.float32)
output, metrics = run_inference_sync(manifest_path, input_data)
print(metrics.total_time_ms)
3.2 Explicit configuration
from rex.api.config import RexConfig
from rex.api.load import load_model
config = RexConfig()
config.storage.base_url = "http://localhost:8080"
config.cache.max_memory_cache_bytes = 512 * 1024 * 1024
config.scheduler.prefetch_window = 4
runtime = load_model("./rex_output/manifest.json", config=config)
4. Feature Configuration (Enable/Disable)
The package exposes feature controls through RexConfig. This section maps directly to currently available fields in src/rex/api/config.py.
4.1 Cache and invariant controls
from rex.api.config import RexConfig
config = RexConfig()
# Keep the core Rex invariant on in production
config.cache.enforce_invariant = True
config.cache.max_local_fraction_of_model = 0.4
# Cache policy: lru | lfu | weighted_utility
config.cache.policy = "weighted_utility"
config.cache.weighted_alpha = 0.5
config.cache.weighted_beta = 0.3
config.cache.weighted_gamma = 0.2
# Disk cache toggle
config.cache.max_disk_cache_bytes = 0 # 0 disables disk cache
config.cache.disk_cache_path = ".rex_cache"
4.2 Scheduler and prefetch toggles
from rex.api.config import RexConfig
config = RexConfig()
# Prefetch toggle
config.scheduler.enable_prefetch = True
# Window behavior
config.scheduler.prefetch_window = 4
config.scheduler.adaptive_window = True
# Planner mode
config.scheduler.scheduler_mode = "graph" # or "sequential"
config.scheduler.build_execution_dag_metadata = True
# Graph partitioning toggle
config.scheduler.enable_graph_partitioning = True
config.scheduler.graph_partition_budget_bytes = 0
# Advanced prefetch tuning
config.scheduler.prefetch_horizon = 4
config.scheduler.max_prefetch_queue_depth = 0
4.3 Runtime priors and eviction strategy
from rex.api.config import RexConfig
config = RexConfig()
config.scheduler.enable_attention_priors = True
config.scheduler.enable_moe_priors = True
# aggressive | retain_n_layers | smart | none
config.scheduler.eviction_strategy = "retain_n_layers"
config.scheduler.retain_window = 2
4.4 Storage concurrency and rate behavior
from rex.api.config import RexConfig
config = RexConfig()
config.storage.max_concurrent_fetches = 4
config.storage.adaptive_concurrency = True
config.storage.respect_rate_limits = True
config.storage.request_timeout_seconds = 30.0
config.storage.retry_max_attempts = 3
4.5 Observability toggles
from rex.api.config import RexConfig
config = RexConfig()
config.observability.log_level = "INFO"
config.observability.log_format = "console" # console | json | quiet
config.observability.enable_tracing = True
config.observability.trace_export_path = None
5. Preset Patterns
5.1 Minimal/debug profile (most advanced features off)
from rex.api.config import RexConfig
config = RexConfig()
config.cache.policy = "lru"
config.cache.max_disk_cache_bytes = 0
config.scheduler.enable_prefetch = False
config.scheduler.adaptive_window = False
config.scheduler.scheduler_mode = "sequential"
config.scheduler.build_execution_dag_metadata = False
config.scheduler.enable_graph_partitioning = False
config.scheduler.enable_attention_priors = False
config.scheduler.enable_moe_priors = False
config.storage.adaptive_concurrency = False
5.2 Throughput-oriented profile (features on)
from rex.api.config import RexConfig
config = RexConfig()
config.cache.policy = "weighted_utility"
config.cache.max_disk_cache_bytes = 2 * 1024 * 1024 * 1024
config.scheduler.enable_prefetch = True
config.scheduler.scheduler_mode = "graph"
config.scheduler.enable_graph_partitioning = True
config.scheduler.prefetch_horizon = 8
config.scheduler.max_prefetch_queue_depth = 32
config.scheduler.enable_attention_priors = True
config.scheduler.enable_moe_priors = True
config.storage.adaptive_concurrency = True
config.storage.max_concurrent_fetches = 8
6. Environment Variable Overrides
RexConfig supports these environment variables:
REX_STORAGE_BACKENDREX_STORAGE_URLREX_AUTH_TOKENREX_CACHE_SIZE_MBREX_CACHE_POLICYREX_PREFETCH_WINDOWREX_LOG_LEVELREX_MAX_LOCAL_FRACREX_DEBUG
Example:
REX_STORAGE_URL=http://localhost:8080 \
REX_CACHE_SIZE_MB=512 \
REX_CACHE_POLICY=lru \
REX_PREFETCH_WINDOW=4 \
REX_LOG_LEVEL=DEBUG \
python your_script.py
7. Kaggle and Colab Usage
Rex works in notebook environments as long as: - You install package extras needed by your workload. - Manifest and chunk files are reachable over HTTP range requests.
Notebook pattern:
!pip install "rex-framework[pytorch]"
from rex.api.load import load_model
runtime = load_model("https://your-host/path/to/manifest.json")
This keeps full model weights remote while fetching only required chunks at runtime.
8. Test Support (Secondary)
The package supports validation workflows, but it does not ship the repository's full test suite.
Typical uses in CI and application test environments:
rex-validate ./rex_output/manifest.json
rex-benchmark --manifest ./rex_output/manifest.json --base-url http://localhost:8080
9. Included CLI Commands
rex-convertrex-serverex-validaterex-inspectrex-benchmarkrex-run-demo
10. Notes
- For most users, install
rex-framework[pytorch]. - Keep
cache.enforce_invariant=Truein production. - If you disable prefetch and graph planning, expect lower throughput but easier debugging.
- For deeper internals and architecture rationale, use repository documentation.
This page is generated from rex_framework_package_guide.md on the main branch.