Skip to content

Ollama Guide

This page is the dedicated guide for running Hypercontext with Ollama. Ollama is a good fit when you want local inference, private data handling, and a model server that works with the full Hypercontext stack.

What Ollama Gives You

With Ollama you can use Hypercontext for:

  • direct provider-backed completions
  • LLMClient chat/completion flows
  • TaskAgent task workflows
  • MetaAgent repository and tool workflows
  • the run, evaluate, archive, benchmark, mcp, serve, ui, and tui commands
  • local or offline experimentation without sending prompts to a cloud API

Hypercontext talks to Ollama through its REST API, so any chat-capable model that works with Ollama can be used with the provider layer. In practice, that means you can keep the same Hypercontext code and switch the model name from llama3 to another installed Ollama model without changing your app logic.

Prerequisites

  • Hypercontext installed as a Python package
  • Ollama installed and running locally
  • At least one model pulled into Ollama
  • requests available in the Python environment

Start Ollama if it is not already running:

ollama serve

Pull a model:

ollama pull llama3

List installed models:

ollama list

Environment Variables

The minimal Ollama configuration is:

HYPERCONTEXT_PROVIDER=ollama
HYPERCONTEXT_MODEL=llama3
OLLAMA_BASE_URL=http://localhost:11434

Optional settings:

  • HYPERCONTEXT_TEMPERATURE for creativity vs. determinism
  • HYPERCONTEXT_MAX_TOKENS for longer or shorter responses
  • HYPERCONTEXT_OUTPUT_DIR for run artifacts
  • HYPERCONTEXT_MAX_GENERATIONS for evolution runs
  • HYPERCONTEXT_PARENT_SELECTION for archive-driven parent choice

You can keep these in .env and let the provider examples auto-load them, or export them in your shell.

Use The Provider Directly

The provider registry includes Ollama as a first-class backend.

from hypercontext import LLMClient
from hypercontext.providers import ProviderRegistry

registry = ProviderRegistry.instance()
provider = registry.create(
    "ollama",
    model="llama3",
    base_url="http://localhost:11434",
)

client = LLMClient(provider=provider)
text, history, metadata = client.complete(
    "Explain how Hypercontext uses provider backends."
)
print(text)

If you prefer chat-style calls:

response = client.chat([
    {"role": "user", "content": "List three Hypercontext commands that help with local workflows."}
])
print(response.content)

Use Ollama With Agents

Hypercontext agents can use Ollama through the same client/provider layer.

TaskAgent

Use TaskAgent when you want a structured single-task wrapper:

import json

from hypercontext import LLMClient, TaskAgent
from hypercontext.agents.base import BaseAgentConfig
from hypercontext.providers import ProviderRegistry

registry = ProviderRegistry.instance()
provider = registry.create("ollama", model="llama3", base_url="http://localhost:11434")
client = LLMClient(provider=provider)

def llm_fn(*, model, messages, temperature=0.0, max_tokens=512):
    response = client.chat(
        messages,
        model=model,
        temperature=temperature,
        max_tokens=max_tokens,
    )
    return json.dumps({"prediction": response.content})

agent = TaskAgent(
    model="llama3",
    config=BaseAgentConfig(
        model="llama3",
        temperature=0.0,
        max_tokens=512,
        system_prompt="Return a concise answer.",
    ),
)
agent._llm_fn = llm_fn
prediction, history = agent.forward(
    {"domain": "docs", "task": "Explain why Ollama is useful for Hypercontext."}
)
print(prediction)

HyperContext

Use the top-level orchestrator when you want the evolution loop:

from hypercontext import HyperContext

hc = HyperContext(output_dir="./hypercontext_output")
print(hc.run(max_generations=3))

The same code works with Ollama as long as your environment points at the Ollama server and model you want to use.

Use Ollama With The CLI

Once the environment variables are set, the rest of Hypercontext works with the same provider:

python -m hypercontext providers
python -m hypercontext run --generations 3 --output-dir ./runs/ollama-demo --workdir .
python -m hypercontext evaluate path/to/code.py --domains memory --workdir .
python -m hypercontext archive list
python -m hypercontext benchmark --domains search --generations 2 --workdir .
python -m hypercontext tui --workdir .
python -m hypercontext mcp --workdir .
python -m hypercontext serve --port 8080 --workdir .
python -m hypercontext ui --port 3000 --workdir .

Suggested Workflow

  1. Start Ollama with ollama serve.
  2. Pull the model you want.
  3. Export HYPERCONTEXT_PROVIDER=ollama and HYPERCONTEXT_MODEL=<model>.
  4. Run python -m hypercontext providers to confirm Ollama is available.
  5. Use the CLI, TUI, browser dashboard, or MCP daemon with the same settings.

Model Compatibility

Ollama supports many local models. Hypercontext works best with models that can answer clearly in chat format and follow instructions well.

Good candidates include:

  • llama3
  • mistral
  • qwen2
  • gemma
  • other Ollama-installed chat-capable models

If a model is weak at instruction following, Hypercontext still runs, but provider-backed workflows, benchmarks, and agent tasks may produce noisier results. For the smoothest experience, start with a chat-oriented model.

Named Provider Presets

You can also keep Ollama in a named preset alongside other providers:

provider_name: local-ollama
provider_presets:
  local-ollama:
    provider: ollama
    model: llama3
    base_url: http://localhost:11434
  remote-claude:
    provider: anthropic
    model: claude-sonnet-4-20250514
    api_key: ${ANTHROPIC_API_KEY}
    base_url: https://api.anthropic.com

This is useful when you want to switch between local and cloud providers without changing your agent code.

Troubleshooting

If Ollama is not working:

  • Confirm ollama serve is running
  • Confirm the model name matches ollama list
  • Confirm OLLAMA_BASE_URL points at the correct host and port
  • Confirm the Python environment has requests installed
  • Run python -m hypercontext providers to confirm Ollama is detected
  • Start with a small prompt before using a larger agent workflow