Ollama Guide¶

This page is the dedicated guide for running Hypercontext with Ollama. Ollama is a good fit when you want local inference, private data handling, and a model server that works with the full Hypercontext stack.

What Ollama Gives You¶

With Ollama you can use Hypercontext for:

direct provider-backed completions
LLMClient chat/completion flows
TaskAgent task workflows
MetaAgent repository and tool workflows
the run, evaluate, archive, benchmark, mcp, serve, ui, and tui commands
local or offline experimentation without sending prompts to a cloud API

Hypercontext talks to Ollama through its REST API, so any chat-capable model that works with Ollama can be used with the provider layer. In practice, that means you can keep the same Hypercontext code and switch the model name from llama3 to another installed Ollama model without changing your app logic.

Prerequisites¶

Hypercontext installed as a Python package
Ollama installed and running locally
At least one model pulled into Ollama
requests available in the Python environment

Start Ollama if it is not already running:

ollama serve

Pull a model:

ollama pull llama3

List installed models:

ollama list

Environment Variables¶

The minimal Ollama configuration is:

HYPERCONTEXT_PROVIDER=ollama
HYPERCONTEXT_MODEL=llama3
OLLAMA_BASE_URL=http://localhost:11434

Optional settings:

HYPERCONTEXT_TEMPERATURE for creativity vs. determinism
HYPERCONTEXT_MAX_TOKENS for longer or shorter responses
HYPERCONTEXT_OUTPUT_DIR for run artifacts
HYPERCONTEXT_MAX_GENERATIONS for evolution runs
HYPERCONTEXT_PARENT_SELECTION for archive-driven parent choice

You can keep these in .env and let the provider examples auto-load them, or export them in your shell.

Use The Provider Directly¶

The provider registry includes Ollama as a first-class backend.

from hypercontext import LLMClient
from hypercontext.providers import ProviderRegistry

registry = ProviderRegistry.instance()
provider = registry.create(
    "ollama",
    model="llama3",
    base_url="http://localhost:11434",
)

client = LLMClient(provider=provider)
text, history, metadata = client.complete(
    "Explain how Hypercontext uses provider backends."
)
print(text)

If you prefer chat-style calls:

response = client.chat([
    {"role": "user", "content": "List three Hypercontext commands that help with local workflows."}
])
print(response.content)

Use Ollama With Agents¶

Hypercontext agents can use Ollama through the same client/provider layer.

TaskAgent¶

Use TaskAgent when you want a structured single-task wrapper:

import json

from hypercontext import LLMClient, TaskAgent
from hypercontext.agents.base import BaseAgentConfig
from hypercontext.providers import ProviderRegistry

registry = ProviderRegistry.instance()
provider = registry.create("ollama", model="llama3", base_url="http://localhost:11434")
client = LLMClient(provider=provider)

def llm_fn(*, model, messages, temperature=0.0, max_tokens=512):
    response = client.chat(
        messages,
        model=model,
        temperature=temperature,
        max_tokens=max_tokens,
    )
    return json.dumps({"prediction": response.content})

agent = TaskAgent(
    model="llama3",
    config=BaseAgentConfig(
        model="llama3",
        temperature=0.0,
        max_tokens=512,
        system_prompt="Return a concise answer.",
    ),
)
agent._llm_fn = llm_fn
prediction, history = agent.forward(
    {"domain": "docs", "task": "Explain why Ollama is useful for Hypercontext."}
)
print(prediction)

HyperContext¶

Use the top-level orchestrator when you want the evolution loop:

from hypercontext import HyperContext

hc = HyperContext(output_dir="./hypercontext_output")
print(hc.run(max_generations=3))

The same code works with Ollama as long as your environment points at the Ollama server and model you want to use.

Use Ollama With The CLI¶

Once the environment variables are set, the rest of Hypercontext works with the same provider:

python -m hypercontext providers
python -m hypercontext run --generations 3 --output-dir ./runs/ollama-demo --workdir .
python -m hypercontext evaluate path/to/code.py --domains memory --workdir .
python -m hypercontext archive list
python -m hypercontext benchmark --domains search --generations 2 --workdir .
python -m hypercontext tui --workdir .
python -m hypercontext mcp --workdir .
python -m hypercontext serve --port 8080 --workdir .
python -m hypercontext ui --port 3000 --workdir .

Suggested Workflow¶

Start Ollama with ollama serve.
Pull the model you want.
Export HYPERCONTEXT_PROVIDER=ollama and HYPERCONTEXT_MODEL=<model>.
Run python -m hypercontext providers to confirm Ollama is available.
Use the CLI, TUI, browser dashboard, or MCP daemon with the same settings.

Model Compatibility¶

Ollama supports many local models. Hypercontext works best with models that can answer clearly in chat format and follow instructions well.

Good candidates include:

llama3
mistral
qwen2
gemma
other Ollama-installed chat-capable models

If a model is weak at instruction following, Hypercontext still runs, but provider-backed workflows, benchmarks, and agent tasks may produce noisier results. For the smoothest experience, start with a chat-oriented model.

Named Provider Presets¶

You can also keep Ollama in a named preset alongside other providers:

provider_name: local-ollama
provider_presets:
  local-ollama:
    provider: ollama
    model: llama3
    base_url: http://localhost:11434
  remote-claude:
    provider: anthropic
    model: claude-sonnet-4-20250514
    api_key: ${ANTHROPIC_API_KEY}
    base_url: https://api.anthropic.com

This is useful when you want to switch between local and cloud providers without changing your agent code.

Troubleshooting¶

If Ollama is not working:

Confirm ollama serve is running
Confirm the model name matches ollama list
Confirm OLLAMA_BASE_URL points at the correct host and port
Confirm the Python environment has requests installed
Run python -m hypercontext providers to confirm Ollama is detected
Start with a small prompt before using a larger agent workflow