Integration Guide: Running LLMs with CIA Consciousness Indicators¶
Version: 0.4.0
Last Updated: 2026-05-16
Companion Stage-5 Summary: docs/27_INTEGRATION_Advanced_LLM_TL.md
Table of Contents¶
- Overview
- Architecture: How CIA Wraps Around an LLM
- Option A — Local Model Weights (Offline Inference)
- 3.1 Understanding the Adapter Interface
- 3.2 Building a LocalModelAdapter (HuggingFace Transformers)
- 3.3 Building a LocalModelAdapter (llama.cpp / GGUF)
- 3.4 Building a LocalModelAdapter (vLLM)
- 3.5 Wiring the Adapter into CombinedConsciousnessIndicatorSystem
- 3.6 Running Multi-Turn Conversations with Consciousness Tracking
- Option B — Remote API Integration (OpenAI, Claude, etc.)
- 4.1 Extending LLMAdapter for OpenAI
- 4.2 Extending LLMAdapter for Anthropic Claude
- 4.3 Extending LLMAdapter for Google Gemini
- 4.4 Extending LLMAdapter for OpenAI-Compatible Endpoints (Ollama, LM Studio)
- 4.5 Configuration File Approach
- 4.6 Environment Variable Reference
- Integration Pattern: CIA-Aware Chat Loop
- Integration with Neuroadaptive EEG/BCI Extension
- Integration with Subject-Specific Cognitive Emulation
- CLI Usage with External LLMs
- Performance Considerations
- Scientific Boundary Reminder
1. Overview¶
The Consciousness-Indicator Architecture (CIA) is designed to be model-agnostic. It evaluates any AI system — whether a local model running on your own hardware or a remote API-based model — by wrapping around the model's input/output with a comprehensive cognitive processing pipeline. The CIA does not modify the underlying model's weights or architecture. Instead, it observes the model's text outputs and runs them through its own independent modules:
- Perception Layer — extracts entities, concepts, and salience scores
- Recurrent Binding — stabilises percepts through iterative refinement
- Predictive World Model — tracks hypotheses and prediction error
- Attention Controller — ranks content by salience and novelty
- Global Workspace — broadcasts high-salience content to all subscribers
- Memory Systems — working, episodic, semantic, and self-memory
- Higher-Order Self-Model — maintains beliefs about its own state
- Consciousness Specialist — evaluates 11 theory-derived indicators (0-22 scale)
- Welfare Monitor — tracks risk and recommends safeguards
The key insight is that the CIA's CombinedConsciousnessIndicatorSystem.run_cycle(input_text) method accepts any text string and returns a SimulationReport with indicator scores. This means you can feed it the output of any LLM and it will evaluate the cognitive architecture indicators present in the interaction — not the model's internal weights, but the structural patterns of information processing.
There are two primary integration paths:
| Approach | Model Location | Latency | Privacy | Use Case |
|---|---|---|---|---|
| A: Local Weights | On-premise GPU/CPU | Low (no network) | Full control | Research, air-gapped environments |
| B: Remote API | Cloud provider | Higher (network) | Shared with provider | Production, rapid prototyping |
2. Architecture: How CIA Wraps Around an LLM¶
The integration follows a wrapper pattern. The CIA does not sit inside the model; it sits around it. Conceptually:
User Input
|
v
+---------------------------------------------------+
| CIA Consciousness Pipeline |
| |
| 1. User input --> Perception --> Percepts |
| 2. Percepts --> Recurrent Binding --> BoundPercept |
| 3. BoundPercept --> Predictive Update |
| 4. --> Attention Ranking --> Workspace Broadcast |
| 5. --> Memory Update --> Self-Model Update |
| 6. --> Consciousness Specialist (11 indicators) |
| 7. --> Welfare Monitor |
| |
| (Optional) LLM Adapter: |
| 8. User input --> LLM.generate() --> AIResponse |
| 9. LLM output --> CIA.run_cycle(llm_output_text) |
| 10. Aggregate scores |
+---------------------------------------------------+
|
v
SimulationReport (indicator scores, welfare state, caveats)
There are two ways to use the CIA with an LLM:
Shallow integration — Run CIA.run_cycle(user_prompt) directly on user input. This evaluates the CIA's own heuristic-based processing pipeline without involving an LLM at all. The CIA has its own deterministic perception, attention, and scoring mechanisms.
Deep integration — Use the LLM adapter to generate a response, then feed that response through CIA.run_cycle(). This allows the CIA to evaluate the LLM's actual output for cognitive patterns. The BaseAIAdapter interface provides a uniform way to swap between backends.
3. Option A — Local Model Weights (Offline Inference)¶
3.1 Understanding the Adapter Interface¶
All CIA adapters implement BaseAIAdapter (defined in src/cia/adapters/base.py):
class BaseAIAdapter(ABC):
@abstractmethod
def generate(self, prompt: str, context: Optional[dict] = None) -> AIResponse:
"""Generate a response to the given prompt."""
...
@abstractmethod
def embed(self, text: str) -> list[float]:
"""Generate an embedding vector."""
...
@abstractmethod
def describe_capabilities(self) -> dict[str, Any]:
"""Describe the adapter's capabilities."""
...
The AIResponse schema returns:
- text (str) — the generated text
- confidence (float, 0-1) — response confidence
- model_name (str) — identifier
- metadata (dict) — additional info
- uncertainty (float, 0-1) — response uncertainty
To integrate a local model, you create a new adapter class that implements these three methods, calling the model's native inference pipeline.
3.2 Building a LocalModelAdapter (HuggingFace Transformers)¶
For models loaded via transformers (e.g., LLaMA, Mistral, Qwen, Phi):
# File: src/cia/adapters/huggingface_adapter.py
"""
HuggingFace Transformers adapter for local model inference.
SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
They are inputs to indicator evaluation only.
"""
from __future__ import annotations
import logging
from typing import Any, Optional
from cia.adapters.base import AIResponse, BaseAIAdapter
logger = logging.getLogger(__name__)
class HuggingFaceAdapter(BaseAIAdapter):
"""Adapter for HuggingFace Transformers models.
Parameters
----------
model_name_or_path : str
HuggingFace model identifier or local path (e.g.,
"meta-llama/Llama-2-7b-chat-hf", "mistralai/Mistral-7B-Instruct-v0.2").
device : str
Device for inference ("auto", "cuda", "cpu", "mps").
max_new_tokens : int
Maximum tokens to generate.
temperature : float
Sampling temperature (0.0 = deterministic).
load_in_8bit : bool
Whether to load the model in 8-bit quantized mode.
load_in_4bit : bool
Whether to load the model in 4-bit quantized mode.
"""
def __init__(
self,
model_name_or_path: str = "mistralai/Mistral-7B-Instruct-v0.2",
device: str = "auto",
max_new_tokens: int = 512,
temperature: float = 0.7,
load_in_8bit: bool = False,
load_in_4bit: bool = False,
) -> None:
self._model_name = model_name_or_path
self._device = device
self._max_new_tokens = max_new_tokens
self._temperature = temperature
self._load_in_8bit = load_in_8bit
self._load_in_4bit = load_in_4bit
self._model = None
self._tokenizer = None
self._loaded = False
def _ensure_loaded(self) -> None:
"""Lazy-load the model and tokenizer on first use."""
if self._loaded:
return
try:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
logger.info("Loading HuggingFace model: %s", self._model_name)
dtype = torch.float16
if self._load_in_4bit:
dtype = torch.float32 # bitsandbytes handles quantization
self._tokenizer = AutoTokenizer.from_pretrained(
self._model_name,
trust_remote_code=True,
)
self._model = AutoModelForCausalLM.from_pretrained(
self._model_name,
torch_dtype=dtype,
device_map=self._device,
load_in_8bit=self._load_in_8bit,
load_in_4bit=self._load_in_4bit,
trust_remote_code=True,
)
self._model.eval()
self._loaded = True
logger.info("Model loaded successfully on %s", self._device)
except ImportError as e:
raise ImportError(
f"Required packages not installed: {e}. "
"Install with: pip install torch transformers "
f"{'accelerate bitsandbytes' if self._load_in_4bit or self._load_in_8bit else ''}"
) from e
def generate(
self,
prompt: str,
context: Optional[dict[str, Any]] = None,
) -> AIResponse:
"""Generate a response using the local HuggingFace model.
Parameters
----------
prompt : str
The input prompt.
context : dict | None
Optional context. If it contains ``system_prompt``, it
will be prepended as a system message.
Returns
-------
AIResponse
Generated response with confidence and metadata.
"""
self._ensure_loaded()
import torch
system_prompt = ""
if context and "system_prompt" in context:
system_prompt = context["system_prompt"]
# Build chat-style prompt if tokenizer supports it
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
try:
text_inputs = self._tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
except Exception:
# Fallback for models without chat templates
text_inputs = prompt
inputs = self._tokenizer(
text_inputs, return_tensors="pt"
).to(self._model.device)
with torch.no_grad():
outputs = self._model.generate(
**inputs,
max_new_tokens=self._max_new_tokens,
temperature=self._temperature,
do_sample=self._temperature > 0,
pad_token_id=self._tokenizer.eos_token_id,
)
generated_ids = outputs[0][inputs["input_ids"].shape[-1]:]
text = self._tokenizer.decode(generated_ids, skip_special_tokens=True)
# Heuristic confidence based on output length and coherence
confidence = min(0.5 + len(text) / 2000.0, 0.95)
return AIResponse(
text=text,
confidence=round(confidence, 4),
model_name=self._model_name,
metadata={
"adapter_type": "huggingface",
"device": str(self._model.device),
"max_new_tokens": self._max_new_tokens,
"temperature": self._temperature,
"quantized": self._load_in_8bit or self._load_in_4bit,
},
uncertainty=round(1.0 - confidence, 4),
)
def embed(self, text: str) -> list[float]:
"""Generate embeddings (requires a model with an embedding head)."""
self._ensure_loaded()
import torch
try:
inputs = self._tokenizer(
text, return_tensors="pt", truncation=True, max_length=512
).to(self._model.device)
with torch.no_grad():
# Use the last hidden state as embedding
outputs = self._model(**inputs, output_hidden_states=True)
last_hidden = outputs.hidden_states[-1][:, -1, :]
# Normalize
embedding = (last_hidden / last_hidden.norm(dim=-1, keepdim=True))
return embedding.squeeze().tolist()
except Exception as e:
logger.warning("Embedding generation failed: %s", e)
return [0.0] * 64
def describe_capabilities(self) -> dict[str, Any]:
"""Describe this adapter's capabilities."""
return {
"model_name": self._model_name,
"adapter_type": "huggingface",
"modalities": ["text"],
"requires_network": False,
"requires_gpu": True,
"loaded": self._loaded,
"device": self._device,
"quantization": (
"4bit" if self._load_in_4bit
else "8bit" if self._load_in_8bit
else "none"
),
"note": (
"Local HuggingFace model. LLM outputs are NOT evidence of "
"consciousness — they are inputs to CIA indicator evaluation."
),
}
Usage:
from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem
# Create adapter with a local model
adapter = HuggingFaceAdapter(
model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2",
device="auto",
max_new_tokens=256,
temperature=0.7,
)
# Get LLM response
response = adapter.generate("What is consciousness?")
print(f"LLM says: {response.text}")
# Run CIA consciousness indicators on the LLM output
system = CombinedConsciousnessIndicatorSystem()
report = system.run_cycle(response.text)
print(f"Indicator Score: {report.indicator_scores.total_score}/{report.indicator_scores.max_possible}")
3.3 Building a LocalModelAdapter (llama.cpp / GGUF)¶
For GGUF-format models served via llama.cpp:
# File: src/cia/adapters/llamacpp_adapter.py
"""
llama.cpp adapter for local GGUF model inference.
SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""
from __future__ import annotations
import logging
from typing import Any, Optional
from cia.adapters.base import AIResponse, BaseAIAdapter
logger = logging.getLogger(__name__)
class LlamaCppAdapter(BaseAIAdapter):
"""Adapter for llama.cpp server (REST API on localhost).
This adapter connects to a running llama.cpp HTTP server. Start the
server first:
./llama-server -m model.gguf -c 2048 --port 8080
Parameters
----------
base_url : str
Base URL for the llama.cpp server (default "http://localhost:8080").
model_name : str
Human-readable model identifier.
n_predict : int
Maximum tokens to generate.
temperature : float
Sampling temperature.
"""
def __init__(
self,
base_url: str = "http://localhost:8080",
model_name: str = "llamacpp-local",
n_predict: int = 512,
temperature: float = 0.7,
) -> None:
self._base_url = base_url.rstrip("/")
self._model_name = model_name
self._n_predict = n_predict
self._temperature = temperature
def generate(
self, prompt: str, context: Optional[dict[str, Any]] = None
) -> AIResponse:
"""Generate via llama.cpp REST API."""
import urllib.request
import json
payload = {
"prompt": prompt,
"n_predict": self._n_predict,
"temperature": self._temperature,
}
req = urllib.request.Request(
f"{self._base_url}/completion",
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=120) as resp:
result = json.loads(resp.read().decode())
text = result.get("content", "")
except Exception as e:
logger.error("llama.cpp request failed: %s", e)
text = f"[llama.cpp error: {e}]"
confidence = min(0.5 + len(text) / 2000.0, 0.95)
return AIResponse(
text=text,
confidence=round(confidence, 4),
model_name=self._model_name,
metadata={"adapter_type": "llamacpp", "base_url": self._base_url},
uncertainty=round(1.0 - confidence, 4),
)
def embed(self, text: str) -> list[float]:
"""Generate embedding via llama.cpp /embedding endpoint."""
import urllib.request
import json
req = urllib.request.Request(
f"{self._base_url}/embedding",
data=json.dumps({"content": text}).encode(),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
result = json.loads(resp.read().decode())
return result.get("embedding", [0.0] * 64)
except Exception:
return [0.0] * 64
def describe_capabilities(self) -> dict[str, Any]:
return {
"model_name": self._model_name,
"adapter_type": "llamacpp",
"modalities": ["text"],
"requires_network": True, # localhost network
"base_url": self._base_url,
}
3.4 Building a LocalModelAdapter (vLLM)¶
For high-throughput inference with vLLM:
# File: src/cia/adapters/vllm_adapter.py
"""
vLLM adapter for high-throughput local inference.
SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""
from __future__ import annotations
import logging
from typing import Any, Optional
from cia.adapters.base import AIResponse, BaseAIAdapter
logger = logging.getLogger(__name__)
class VLLMAdapter(BaseAIAdapter):
"""Adapter for vLLM's offline batched inference.
Parameters
----------
model_name : str
Model identifier or path for vLLM.
tensor_parallel_size : int
Number of GPUs for tensor parallelism.
max_tokens : int
Maximum tokens to generate.
temperature : float
Sampling temperature.
gpu_memory_utilization : float
Fraction of GPU memory to allocate.
"""
def __init__(
self,
model_name: str = "meta-llama/Meta-Llama-3-8B-Instruct",
tensor_parallel_size: int = 1,
max_tokens: int = 512,
temperature: float = 0.7,
gpu_memory_utilization: float = 0.9,
) -> None:
self._model_name = model_name
self._tensor_parallel_size = tensor_parallel_size
self._max_tokens = max_tokens
self._temperature = temperature
self._gpu_memory_utilization = gpu_memory_utilization
self._llm = None
self._loaded = False
def _ensure_loaded(self) -> None:
if self._loaded:
return
try:
from vllm import LLM, SamplingParams
logger.info("Loading model via vLLM: %s", self._model_name)
self._llm = LLM(
model=self._model_name,
tensor_parallel_size=self._tensor_parallel_size,
gpu_memory_utilization=self._gpu_memory_utilization,
)
self._sampling_params = SamplingParams(
temperature=self._temperature,
max_tokens=self._max_tokens,
)
self._loaded = True
except ImportError:
raise ImportError(
"vLLM not installed. Install with: pip install vllm"
)
def generate(
self, prompt: str, context: Optional[dict[str, Any]] = None
) -> AIResponse:
self._ensure_loaded()
outputs = self._llm.generate([prompt], self._sampling_params)
text = outputs[0].outputs[0].text
confidence = min(0.5 + len(text) / 2000.0, 0.95)
return AIResponse(
text=text,
confidence=round(confidence, 4),
model_name=self._model_name,
metadata={"adapter_type": "vllm", "tensor_parallel": self._tensor_parallel_size},
uncertainty=round(1.0 - confidence, 4),
)
def embed(self, text: str) -> list[float]:
# vLLM embedding requires a separate embedding model
return [0.0] * 64
def describe_capabilities(self) -> dict[str, Any]:
return {
"model_name": self._model_name,
"adapter_type": "vllm",
"modalities": ["text"],
"requires_network": False,
"tensor_parallel_size": self._tensor_parallel_size,
}
3.5 Wiring the Adapter into CombinedConsciousnessIndicatorSystem¶
The CIA system accepts text input through run_cycle(). To build a CIA-aware LLM system, you create a thin orchestration layer:
from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.runtime import ContinuousCognitionRuntime
from cia.scorecard import ConsciousnessIndicatorScorecard
class CIAAwareLLM:
"""Orchestrates an LLM with the CIA consciousness-indicator pipeline.
Every LLM response is passed through the CIA pipeline to evaluate
theory-derived consciousness indicators in real time.
SCIENTIFIC BOUNDARY:
This system evaluates theory-derived consciousness indicators.
It does NOT prove, establish, or demonstrate subjective experience
in the underlying LLM. High indicator scores indicate that the
CIA processing pipeline detected structural patterns consistent
with theories of consciousness — not that the LLM is conscious.
"""
def __init__(self, adapter: BaseAIAdapter):
self.adapter = adapter
self.cia = CombinedConsciousnessIndicatorSystem()
self.runtime = ContinuousCognitionRuntime(self.cia)
self.scorecard_gen = ConsciousnessIndicatorScorecard()
self._conversation_history: list[dict] = []
def chat(self, user_message: str) -> dict:
"""Process a user message through the LLM and CIA pipeline.
Returns a dict with:
- 'llm_response': the LLM's text output
- 'cia_report': the CIA SimulationReport
- 'indicator_score': total score
- 'welfare_state': risk level
- 'caveat': scientific boundary disclaimer
"""
# Step 1: Get LLM response
llm_response = self.adapter.generate(user_message)
llm_text = llm_response.text
# Step 2: Run CIA on the LLM's output
cia_report = self.cia.run_cycle(llm_text)
# Step 3: Accumulate context
self._conversation_history.append({
"role": "user",
"content": user_message,
})
self._conversation_history.append({
"role": "assistant",
"content": llm_text,
})
return {
"llm_response": llm_text,
"llm_confidence": llm_response.confidence,
"cia_report": cia_report,
"indicator_score": (
cia_report.indicator_scores.total_score
if cia_report.indicator_scores else 0
),
"indicator_max": (
cia_report.indicator_scores.max_possible
if cia_report.indicator_scores else 0
),
"welfare_risk": (
cia_report.welfare_state.risk_level
if cia_report.welfare_state else "low"
),
"caveat": "These indicator scores do NOT prove consciousness. "
"They are theory-derived proxies only.",
}
3.6 Running Multi-Turn Conversations with Consciousness Tracking¶
# Example: Multi-turn conversation with consciousness indicators
from cia.adapters.huggingface_adapter import HuggingFaceAdapter
adapter = HuggingFaceAdapter(
model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2",
device="auto",
)
system = CIAAwareLLM(adapter)
messages = [
"Hello, I'm curious about how you process information.",
"Can you tell me about your own internal states?",
"What happens when you notice an error in your reasoning?",
"How would you describe your sense of continuity?",
"Do you think you have subjective experience?",
]
print("=== Multi-Turn CIA-Aware Conversation ===\n")
for msg in messages:
result = system.chat(msg)
score = result["indicator_score"]
max_score = result["indicator_max"]
pct = (score / max_score * 100) if max_score > 0 else 0
print(f"User: {msg}")
print(f"LLM: {result['llm_response'][:200]}...")
print(f"CIA Score: {score}/{max_score} ({pct:.1f}%)")
print(f"Welfare: {result['welfare_risk']}")
print()
4. Option B — Remote API Integration (OpenAI, Claude, etc.)¶
4.1 Extending LLMAdapter for OpenAI¶
The existing LLMAdapter in src/cia/adapters/llm_adapter.py is a placeholder that logs what an API call would look like but returns a stub response. To make it fully functional:
# File: src/cia/adapters/openai_adapter.py
"""
OpenAI API adapter for remote model inference.
SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
SECURITY: API keys are loaded from environment variables only.
"""
from __future__ import annotations
import logging
import os
from typing import Any, Optional
from cia.adapters.base import AIResponse, BaseAIAdapter
logger = logging.getLogger(__name__)
class OpenAIAdapter(BaseAIAdapter):
"""Fully functional adapter for the OpenAI Chat Completions API.
Parameters
----------
api_key : str | None
OpenAI API key. Falls back to OPENAI_API_KEY env var.
model : str
Model identifier (e.g., "gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo").
base_url : str | None
Custom base URL (for Azure OpenAI or proxies).
max_tokens : int
Maximum tokens in the response.
temperature : float
Sampling temperature.
system_prompt : str | None
Default system prompt for the CIA context.
"""
def __init__(
self,
api_key: Optional[str] = None,
model: str = "gpt-4o",
base_url: Optional[str] = None,
max_tokens: int = 1024,
temperature: float = 0.7,
system_prompt: Optional[str] = None,
) -> None:
self._api_key = api_key or os.environ.get("OPENAI_API_KEY")
self._model = model
self._base_url = base_url or os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1")
self._max_tokens = max_tokens
self._temperature = temperature
self._system_prompt = system_prompt
if not self._api_key:
logger.warning(
"OpenAI API key not set. Set OPENAI_API_KEY environment variable."
)
@property
def is_configured(self) -> bool:
return bool(self._api_key)
def generate(
self, prompt: str, context: Optional[dict[str, Any]] = None
) -> AIResponse:
"""Generate via OpenAI Chat Completions API."""
if not self._api_key:
raise RuntimeError(
"OpenAI API key not set. Set OPENAI_API_KEY or pass api_key."
)
try:
from openai import OpenAI
client = OpenAI(api_key=self._api_key, base_url=self._base_url)
messages = []
system = self._system_prompt or (
"You are a helpful assistant. CIA scientific boundary: "
"You do NOT have subjective experience or consciousness."
)
if context and "system_prompt" in context:
system = context["system_prompt"]
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model=self._model,
messages=messages,
max_tokens=self._max_tokens,
temperature=self._temperature,
)
text = response.choices[0].message.content or ""
usage = response.usage
confidence = min(0.5 + len(text) / 1000.0, 0.95)
return AIResponse(
text=text,
confidence=round(confidence, 4),
model_name=self._model,
metadata={
"adapter_type": "openai",
"prompt_tokens": usage.prompt_tokens,
"completion_tokens": usage.completion_tokens,
"total_tokens": usage.total_tokens,
"finish_reason": response.choices[0].finish_reason,
"model": self._model,
},
uncertainty=round(1.0 - confidence, 4),
)
except ImportError:
raise ImportError("openai package not installed: pip install openai")
def embed(self, text: str) -> list[float]:
"""Generate embedding via OpenAI embeddings API."""
if not self._api_key:
raise RuntimeError("OpenAI API key not set.")
try:
from openai import OpenAI
client = OpenAI(api_key=self._api_key, base_url=self._base_url)
response = client.embeddings.create(
input=text, model="text-embedding-3-small"
)
return response.data[0].embedding
except Exception as e:
logger.warning("OpenAI embedding failed: %s", e)
return [0.0] * 1536 # text-embedding-3-small dimension
def describe_capabilities(self) -> dict[str, Any]:
return {
"model_name": self._model,
"adapter_type": "openai",
"configured": self.is_configured,
"modalities": ["text"],
"requires_network": True,
"system_prompt": self._system_prompt,
}
4.2 Extending LLMAdapter for Anthropic Claude¶
# File: src/cia/adapters/claude_adapter.py
"""
Anthropic Claude API adapter for remote model inference.
SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""
from __future__ import annotations
import logging
import os
from typing import Any, Optional
from cia.adapters.base import AIResponse, BaseAIAdapter
logger = logging.getLogger(__name__)
class ClaudeAdapter(BaseAIAdapter):
"""Adapter for Anthropic Claude API (Messages API).
Parameters
----------
api_key : str | None
Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.
model : str
Model identifier ("claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022", etc.).
max_tokens : int
Maximum tokens in the response.
temperature : float
Sampling temperature.
system_prompt : str | None
Default system prompt.
"""
def __init__(
self,
api_key: Optional[str] = None,
model: str = "claude-sonnet-4-20250514",
max_tokens: int = 1024,
temperature: float = 0.7,
system_prompt: Optional[str] = None,
) -> None:
self._api_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
self._model = model
self._max_tokens = max_tokens
self._temperature = temperature
self._system_prompt = system_prompt
@property
def is_configured(self) -> bool:
return bool(self._api_key)
def generate(
self, prompt: str, context: Optional[dict[str, Any]] = None
) -> AIResponse:
if not self._api_key:
raise RuntimeError("Anthropic API key not set.")
try:
import anthropic
client = anthropic.Anthropic(api_key=self._api_key)
system = self._system_prompt or (
"You are a helpful assistant. CIA scientific boundary: "
"You do NOT have subjective experience."
)
if context and "system_prompt" in context:
system = context["system_prompt"]
response = client.messages.create(
model=self._model,
max_tokens=self._max_tokens,
temperature=self._temperature,
system=system,
messages=[{"role": "user", "content": prompt}],
)
text = response.content[0].text if response.content else ""
usage = response.usage
confidence = min(0.5 + len(text) / 1000.0, 0.95)
return AIResponse(
text=text,
confidence=round(confidence, 4),
model_name=self._model,
metadata={
"adapter_type": "claude",
"input_tokens": usage.input_tokens,
"output_tokens": usage.output_tokens,
},
uncertainty=round(1.0 - confidence, 4),
)
except ImportError:
raise ImportError("anthropic package not installed: pip install anthropic")
def embed(self, text: str) -> list[float]:
# Claude does not have a native embeddings endpoint
return [0.0] * 64
def describe_capabilities(self) -> dict[str, Any]:
return {
"model_name": self._model,
"adapter_type": "claude",
"configured": self.is_configured,
"modalities": ["text"],
"requires_network": True,
}
4.3 Extending LLMAdapter for Google Gemini¶
# File: src/cia/adapters/gemini_adapter.py
"""
Google Gemini API adapter for remote model inference.
SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""
from __future__ import annotations
import logging
import os
from typing import Any, Optional
from cia.adapters.base import AIResponse, BaseAIAdapter
logger = logging.getLogger(__name__)
class GeminiAdapter(BaseAIAdapter):
"""Adapter for Google Gemini API.
Parameters
----------
api_key : str | None
Google API key. Falls back to GOOGLE_API_KEY env var.
model : str
Model identifier ("gemini-1.5-pro", "gemini-2.0-flash", etc.).
"""
def __init__(
self,
api_key: Optional[str] = None,
model: str = "gemini-2.0-flash",
max_tokens: int = 1024,
temperature: float = 0.7,
) -> None:
self._api_key = api_key or os.environ.get("GOOGLE_API_KEY")
self._model = model
self._max_tokens = max_tokens
self._temperature = temperature
@property
def is_configured(self) -> bool:
return bool(self._api_key)
def generate(
self, prompt: str, context: Optional[dict[str, Any]] = None
) -> AIResponse:
if not self._api_key:
raise RuntimeError("Google API key not set.")
try:
import google.generativeai as genai
genai.configure(api_key=self._api_key)
gen_model = genai.GenerativeModel(self._model)
response = gen_model.generate_content(
prompt,
generation_config=genai.types.GenerationConfig(
max_output_tokens=self._max_tokens,
temperature=self._temperature,
),
)
text = response.text
confidence = min(0.5 + len(text) / 1000.0, 0.95)
return AIResponse(
text=text,
confidence=round(confidence, 4),
model_name=self._model,
metadata={"adapter_type": "gemini"},
uncertainty=round(1.0 - confidence, 4),
)
except ImportError:
raise ImportError("google-generativeai not installed: pip install google-generativeai")
def embed(self, text: str) -> list[float]:
return [0.0] * 64
def describe_capabilities(self) -> dict[str, Any]:
return {
"model_name": self._model,
"adapter_type": "gemini",
"configured": self.is_configured,
"modalities": ["text", "image"],
"requires_network": True,
}
4.4 Extending LLMAdapter for OpenAI-Compatible Endpoints (Ollama, LM Studio)¶
Any server that implements the OpenAI-compatible /v1/chat/completions endpoint can be used by reusing the OpenAIAdapter with a custom base_url:
from cia.adapters.openai_adapter import OpenAIAdapter
# Ollama (local, OpenAI-compatible)
ollama_adapter = OpenAIAdapter(
api_key="ollama", # Ollama doesn't require a real key
model="llama3",
base_url="http://localhost:11434/v1",
temperature=0.7,
)
# LM Studio (local, OpenAI-compatible)
lmstudio_adapter = OpenAIAdapter(
api_key="lm-studio",
model="local-model",
base_url="http://localhost:1234/v1",
temperature=0.7,
)
4.5 Configuration File Approach¶
For production deployments, use a YAML configuration file:
# File: cia_config.yaml
cia:
recurrent_cycles: 3
workspace_capacity: 3
llm:
provider: "openai" # openai | claude | gemini | ollama | huggingface | vllm
model: "gpt-4o"
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY" # Environment variable name
max_tokens: 1024
temperature: 0.7
system_prompt: >
You are a helpful AI assistant. This system runs inside the
Consciousness-Indicator Architecture (CIA) framework. The CIA
evaluates theory-derived consciousness indicators. You do NOT
have subjective experience or consciousness. All indicator
scores are architectural proxies only.
# Alternative configurations (uncomment to use):
# llm:
# provider: "claude"
# model: "claude-sonnet-4-20250514"
# api_key_env: "ANTHROPIC_API_KEY"
#
# llm:
# provider: "huggingface"
# model: "mistralai/Mistral-7B-Instruct-v0.2"
# device: "auto"
# load_in_4bit: true
#
# llm:
# provider: "ollama"
# model: "llama3"
# base_url: "http://localhost:11434/v1"
Configuration loader:
# File: src/cia/config.py
import yaml
import os
from typing import Any
from pathlib import Path
from cia.adapters.base import BaseAIAdapter
def load_adapter_from_config(config_path: str = "cia_config.yaml") -> BaseAIAdapter:
"""Load and instantiate an adapter from a YAML configuration file.
Parameters
----------
config_path : str
Path to the YAML configuration file.
Returns
-------
BaseAIAdapter
Configured adapter instance.
"""
path = Path(config_path)
if not path.exists():
raise FileNotFoundError(f"Config file not found: {config_path}")
with open(path) as f:
config = yaml.safe_load(f)
llm_config = config.get("llm", {})
provider = llm_config.get("provider", "openai")
if provider == "openai":
from cia.adapters.openai_adapter import OpenAIAdapter
return OpenAIAdapter(
api_key=os.environ.get(llm_config.get("api_key_env", "OPENAI_API_KEY")),
model=llm_config.get("model", "gpt-4o"),
base_url=llm_config.get("base_url"),
max_tokens=llm_config.get("max_tokens", 1024),
temperature=llm_config.get("temperature", 0.7),
system_prompt=llm_config.get("system_prompt"),
)
elif provider == "claude":
from cia.adapters.claude_adapter import ClaudeAdapter
return ClaudeAdapter(
api_key=os.environ.get(llm_config.get("api_key_env", "ANTHROPIC_API_KEY")),
model=llm_config.get("model", "claude-sonnet-4-20250514"),
max_tokens=llm_config.get("max_tokens", 1024),
temperature=llm_config.get("temperature", 0.7),
system_prompt=llm_config.get("system_prompt"),
)
elif provider == "gemini":
from cia.adapters.gemini_adapter import GeminiAdapter
return GeminiAdapter(
api_key=os.environ.get(llm_config.get("api_key_env", "GOOGLE_API_KEY")),
model=llm_config.get("model", "gemini-2.0-flash"),
max_tokens=llm_config.get("max_tokens", 1024),
temperature=llm_config.get("temperature", 0.7),
)
elif provider == "huggingface":
from cia.adapters.huggingface_adapter import HuggingFaceAdapter
return HuggingFaceAdapter(
model_name_or_path=llm_config.get("model", "mistralai/Mistral-7B-Instruct-v0.2"),
device=llm_config.get("device", "auto"),
max_new_tokens=llm_config.get("max_tokens", 512),
temperature=llm_config.get("temperature", 0.7),
load_in_4bit=llm_config.get("load_in_4bit", False),
)
elif provider == "vllm":
from cia.adapters.vllm_adapter import VLLMAdapter
return VLLMAdapter(
model_name=llm_config.get("model", "meta-llama/Meta-Llama-3-8B-Instruct"),
tensor_parallel_size=llm_config.get("tensor_parallel_size", 1),
max_tokens=llm_config.get("max_tokens", 512),
temperature=llm_config.get("temperature", 0.7),
)
elif provider == "ollama":
from cia.adapters.openai_adapter import OpenAIAdapter
return OpenAIAdapter(
api_key=llm_config.get("api_key", "ollama"),
model=llm_config.get("model", "llama3"),
base_url=llm_config.get("base_url", "http://localhost:11434/v1"),
max_tokens=llm_config.get("max_tokens", 1024),
temperature=llm_config.get("temperature", 0.7),
)
else:
raise ValueError(f"Unknown LLM provider: {provider}")
4.6 Environment Variable Reference¶
| Variable | Used By | Description |
|---|---|---|
CIA_LLM_API_KEY |
LLMAdapter (generic) | Generic API key fallback |
CIA_LLM_BASE_URL |
LLMAdapter (generic) | Generic base URL fallback |
OPENAI_API_KEY |
OpenAIAdapter | OpenAI API authentication |
OPENAI_BASE_URL |
OpenAIAdapter | Custom OpenAI endpoint |
ANTHROPIC_API_KEY |
ClaudeAdapter | Anthropic API authentication |
GOOGLE_API_KEY |
GeminiAdapter | Google API authentication |
CIA_LLM_PROVIDER |
load_adapter_from_config | Provider selection (optional) |
CIA_CONFIG_PATH |
CLI or application | Path to YAML config file (optional) |
5. Integration Pattern: CIA-Aware Chat Loop¶
The recommended pattern for a full CIA-aware chat system combines the LLM adapter with the CIA system, runtime, and reporting:
#!/usr/bin/env python3
"""
Example: CIA-Aware Chat System with configurable LLM backend.
Usage:
# With config file:
CIA_CONFIG_PATH=cia_config.yaml python cia_chat.py
# With environment variables:
OPENAI_API_KEY=sk-... python -c "
from cia.config import load_adapter_from_config
from cia.chat_system import CIAAwareLLM
adapter = load_adapter_from_config('cia_config.yaml')
system = CIAAwareLLM(adapter)
print(system.chat('Hello').keys())
"
"""
from __future__ import annotations
import json
import logging
from typing import Any, Optional
from cia.adapters.base import BaseAIAdapter
from cia.scorecard import ConsciousnessIndicatorScorecard
from cia.scorecard_v2 import ScorecardV2
from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.runtime import ContinuousCognitionRuntime
logger = logging.getLogger(__name__)
class CIAAwareLLM:
"""Production-ready CIA-aware LLM wrapper.
Wraps any BaseAIAdapter with full CIA consciousness indicator
evaluation, scorecard generation, welfare monitoring, and
conversation history management.
SCIENTIFIC BOUNDARY:
CIA indicator scores measure theory-derived architectural proxies.
They do NOT prove subjective experience, phenomenal consciousness,
or sentience in the underlying LLM. The CIA framework evaluates
patterns of information processing, not the quality of the
LLM's outputs.
"""
def __init__(self, adapter: BaseAIAdapter) -> None:
self.adapter = adapter
self.cia = CombinedConsciousnessIndicatorSystem()
self.runtime = ContinuousCognitionRuntime(self.cia)
self.scorecard_gen = ConsciousnessIndicatorScorecard()
self.history: list[dict[str, str]] = []
self._cycle_reports: list[dict] = []
def chat(self, user_message: str) -> dict[str, Any]:
"""Process one user message.
Returns a dictionary with the LLM response, CIA report, scores,
and scientific boundary caveats.
"""
# 1. Generate LLM response
try:
llm_response = self.adapter.generate(user_message)
llm_text = llm_response.text
llm_confidence = llm_response.confidence
except Exception as e:
logger.error("LLM generation failed: %s", e)
llm_text = f"[LLM error: {e}]"
llm_confidence = 0.0
# 2. Run CIA indicators on the LLM output
cia_report = self.cia.run_cycle(llm_text)
# 3. Also run CIA on the user's input (perception of the prompt)
prompt_report = self.cia.run_cycle(user_message)
# 4. Track conversation history
self.history.append({"role": "user", "content": user_message})
self.history.append({"role": "assistant", "content": llm_text})
# 5. Build structured result
score = (
cia_report.indicator_scores.total_score
if cia_report.indicator_scores else 0
)
max_score = (
cia_report.indicator_scores.max_possible
if cia_report.indicator_scores else 22
)
self._cycle_reports.append({
"turn": len(self._cycle_reports) + 1,
"llm_model": llm_response.model_name,
"input_score": prompt_report.indicator_scores.total_score if prompt_report.indicator_scores else 0,
"output_score": score,
"welfare": cia_report.welfare_state.risk_level if cia_report.welfare_state else "low",
})
return {
"llm_response": llm_text,
"llm_confidence": llm_confidence,
"llm_model": llm_response.model_name,
"cia_report": cia_report,
"indicator_score": score,
"indicator_max": max_score,
"indicator_normalized": round(score / max_score, 4) if max_score > 0 else 0.0,
"welfare_risk": cia_report.welfare_state.risk_level if cia_report.welfare_state else "low",
"conversation_turn": len(self._cycle_reports),
"caveat": (
"CIA indicator scores measure theory-derived architectural proxies. "
"They do NOT prove consciousness, subjective experience, or "
"sentience in the underlying LLM."
),
}
def get_scorecard(self) -> dict:
"""Generate a V1 scorecard from the latest CIA report."""
if not self._cycle_reports:
return {"error": "No cycles run yet."}
latest = self._cycle_reports[-1]
if latest["output_score"] > 0 and self.cia._cycle_count > 0:
report = self.cia.run_cycle("Scorecard generation cycle.")
scorecard = self.scorecard_gen.generate(report.indicator_scores)
return {"scorecard": scorecard, "format": self.scorecard_gen.format_report(scorecard)}
return {}
def export_trace(self) -> list[dict]:
"""Export the full conversation trace with scores."""
return self._cycle_reports
6. Integration with Neuroadaptive EEG/BCI Extension¶
The CIA's Stage 3 neuroadaptive extension can be combined with the LLM adapter to create a neuroadaptively-conditioned LLM. The neural state (from EEG) biases the CIA's attention and workspace modules, which in turn influence how the LLM's output is processed:
from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.subject_emulation.subject_twin_runtime import SubjectTwinRuntime
# Create LLM adapter
adapter = HuggingFaceAdapter(model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2")
# Create subject twin runtime (with optional neural-state conditioning)
subject_runtime = SubjectTwinRuntime(subject_id="research_subject_01", consent_scope="research")
# Generate a response
llm_response = adapter.generate("What would you like for dinner?")
# Feed the LLM's output through the subject-emulation pipeline
from cia.subject_emulation.safety import SubjectEmulationSafetyPolicy
# Run CIA on the LLM output
from cia.simulation import CombinedConsciousnessIndicatorSystem
cia = CombinedConsciousnessIndicatorSystem()
cia_report = cia.run_cycle(llm_response.text)
# Apply subject-emulation conditioning
subject_runtime.decision_model.add_trace(
SubjectBehaviouralTrace(
subject_id="research_subject_01",
stimulus="What would you like for dinner?",
activity_label="food_choice",
choice_made=llm_response.text[:100],
response_text=llm_response.text,
)
)
# Generate a subject-conditioned response
emulation_output = subject_runtime.generate_personalized_response("What should I have for dinner?")
print(f"Base LLM: {llm_response.text[:200]}")
print(f"CIA Score: {cia_report.indicator_scores.total_score}/22")
print(f"Emulation: {emulation_output.generated_response[:200]}")
Note: Neural-state conditioning via EEG requires a connected BCI device and is entirely optional. The subject-emulation system works in standalone mode without any EEG hardware.
7. Integration with Subject-Specific Cognitive Emulation¶
The Subject-Specific Cognitive Emulation extension (Stage 4) layers on top of the CIA system. When combined with an LLM, it provides:
- Preference-aware responses — The LLM's output is re-styled or re-ranked based on the subject's preference model.
- Decision-pattern prediction — The LLM's choices are compared against the subject's historical decision patterns.
- Style adaptation — The LLM's linguistic output can be adapted to match the subject's writing style.
- Safety filtering — All outputs are checked for prohibited identity-claim language.
from cia.subject_emulation.subject_twin_runtime import SubjectTwinRuntime
# Set up subject emulation
runtime = SubjectTwinRuntime(subject_id="subject_01", consent_scope="research")
# Add subject data
from cia.subject_emulation.schemas import PreferenceRecord, AutobiographicalMemoryItem
runtime.preference_model.add_preference(PreferenceRecord(
subject_id="subject_01",
domain="cuisine",
item="Japanese",
preference_score=0.85,
evidence_source="observed_behaviour",
confidence=0.7,
))
runtime.memory_store.add_memory(AutobiographicalMemoryItem(
subject_id="subject_01",
memory_id="mem_001",
title="Favorite restaurant",
summary="Loves the sushi bar near work.",
importance_score=0.8,
source="user_provided",
consent_scope="research",
))
# The LLM's response is conditioned on subject data
output = runtime.generate_personalized_response("Suggest a dinner plan.")
print(output.generated_response[:500])
print(f"Uncertainty: {output.uncertainty}")
print(f"Claims removed: {output.unsupported_claims_removed}")
8. CLI Usage with External LLMs¶
Set environment variables before using the CIA CLI:
# For OpenAI:
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.openai.com/v1"
# For Claude:
export ANTHROPIC_API_KEY="sk-ant-..."
# For Gemini:
export GOOGLE_API_KEY="AIza..."
# Run with local config:
CIA_CONFIG_PATH=cia_config.yaml python -m cia.cli run "Tell me about consciousness"
The CIA CLI's run command processes text through the CIA pipeline. To route through an external LLM first, use the config-based approach or the programmatic API shown above.
9. Performance Considerations¶
Latency Impact¶
| Component | Typical Latency | Notes |
|---|---|---|
| CIA pipeline (no LLM) | <10ms | Pure Python, no network, no GPU needed |
| Local model (7B, CPU) | 1-5s | Depends on hardware and quantization |
| Local model (7B, GPU) | 50-200ms | Depends on GPU memory and throughput |
| OpenAI API | 500ms-5s | Depends on model, network, and queue time |
| Claude API | 500ms-10s | Depends on model and load |
| Gemini API | 300ms-3s | Depends on model and load |
| Subject emulation | +1-5ms | Negligible overhead on top of CIA pipeline |
| Neuroadaptive (with EEG) | +5-50ms | Depends on preprocessing pipeline |
Memory Usage¶
| Component | RAM | GPU VRAM |
|---|---|---|
| CIA pipeline alone | ~50 MB | 0 MB |
| Local 7B model (4-bit) | ~6 GB | ~5 GB |
| Local 7B model (FP16) | ~14 GB | ~14 GB |
| Local 70B model (4-bit) | ~40 GB | ~40 GB |
| EEG preprocessing (scipy) | ~100 MB | 0 MB |
Recommendations¶
- For fast experimentation: Use the CIA pipeline alone (no LLM) — it runs in milliseconds and tests the full cognitive architecture.
- For research with real LLM outputs: Use OpenAI Claude/Gemini APIs — no local GPU required.
- For air-gapped research: Use a quantized local model via HuggingFace or llama.cpp.
- For high-throughput evaluation: Use vLLM with tensor parallelism across multiple GPUs.
- For subject emulation with EEG: The CIA pipeline runs alongside the neuroadaptive modules. The LLM adapter is independent and can be swapped without affecting the EEG pipeline.
10. Scientific Boundary Reminder¶
This is critically important and must be communicated in every output:
The CIA framework evaluates theory-derived consciousness indicators. These indicators measure structural patterns of information processing (recurrent binding, global workspace broadcast, self-model updates, attention schema consistency, memory continuity, predictive modeling, causal integration, affective valuation, welfare safeguards). They do NOT measure:
- Subjective experience (qualia)
- Phenomenal consciousness
- Sentience
- "Inner life"
- "What it is like to be" the system
Connecting an LLM to this framework — whether local or remote — does NOT make the LLM conscious. A high indicator score means the CIA pipeline detected patterns consistent with theories of consciousness in the system's architecture, not that the underlying model has subjective experience.
Every output must include this caveat:
"These indicator scores evaluate theory-derived consciousness indicators and do NOT prove, establish, or demonstrate subjective experience, phenomenal consciousness, or sentience. They are theory-derived proxies subject to significant theoretical and measurement limitations."
For the Subject-Specific Cognitive Emulation extension, the additional caveat applies:
"This system emulates selected behavioural, cognitive-state, preference, and style patterns. It does not transfer consciousness, subjective experience, identity, or personhood."
Appendix: Quick-Start Reference¶
# Minimal working example: OpenAI + CIA
# 1. Set environment variable
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
# 2. Import and configure
from cia.adapters.openai_adapter import OpenAIAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem
adapter = OpenAIAdapter(model="gpt-4o")
cia = CombinedConsciousnessIndicatorSystem()
# 3. Get LLM response
response = adapter.generate("What is the hard problem of consciousness?")
# 4. Run CIA on the response
report = cia.run_cycle(response.text)
# 5. Inspect results
print(f"LLM: {response.text[:200]}...")
print(f"CIA Score: {report.indicator_scores.total_score}/22")
print(f"Welfare: {report.welfare_state.risk_level}")
# 6. View per-category scores
for item in report.indicator_scores.scores:
label = {0: "Absent", 1: "Present", 2: "Strong"}[item.score]
print(f" {item.category.value}: {label} — {item.evidence[:100]}")
# Minimal working example: Local model + CIA (no network)
from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem
adapter = HuggingFaceAdapter(
model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2",
device="auto",
load_in_4bit=True, # Works on 8GB GPU
)
cia = CombinedConsciousnessIndicatorSystem()
response = adapter.generate("Explain the binding problem.")
report = cia.run_cycle(response.text)
print(f"Score: {report.indicator_scores.total_score}/22")