Integration Guide: Running LLMs with CIA Consciousness Indicators¶

Version: 0.4.0
Last Updated: 2026-05-16

Companion Stage-5 Summary: docs/27_INTEGRATION_Advanced_LLM_TL.md

Table of Contents¶

Overview
Architecture: How CIA Wraps Around an LLM
Option A — Local Model Weights (Offline Inference)
3.1 Understanding the Adapter Interface
3.2 Building a LocalModelAdapter (HuggingFace Transformers)
3.3 Building a LocalModelAdapter (llama.cpp / GGUF)
3.4 Building a LocalModelAdapter (vLLM)
3.5 Wiring the Adapter into CombinedConsciousnessIndicatorSystem
3.6 Running Multi-Turn Conversations with Consciousness Tracking
Option B — Remote API Integration (OpenAI, Claude, etc.)
4.1 Extending LLMAdapter for OpenAI
4.2 Extending LLMAdapter for Anthropic Claude
4.3 Extending LLMAdapter for Google Gemini
4.4 Extending LLMAdapter for OpenAI-Compatible Endpoints (Ollama, LM Studio)
4.5 Configuration File Approach
4.6 Environment Variable Reference
Integration Pattern: CIA-Aware Chat Loop
Integration with Neuroadaptive EEG/BCI Extension
Integration with Subject-Specific Cognitive Emulation
CLI Usage with External LLMs
Performance Considerations
Scientific Boundary Reminder

1. Overview¶

The Consciousness-Indicator Architecture (CIA) is designed to be model-agnostic. It evaluates any AI system — whether a local model running on your own hardware or a remote API-based model — by wrapping around the model's input/output with a comprehensive cognitive processing pipeline. The CIA does not modify the underlying model's weights or architecture. Instead, it observes the model's text outputs and runs them through its own independent modules:

Perception Layer — extracts entities, concepts, and salience scores
Recurrent Binding — stabilises percepts through iterative refinement
Predictive World Model — tracks hypotheses and prediction error
Attention Controller — ranks content by salience and novelty
Global Workspace — broadcasts high-salience content to all subscribers
Memory Systems — working, episodic, semantic, and self-memory
Higher-Order Self-Model — maintains beliefs about its own state
Consciousness Specialist — evaluates 11 theory-derived indicators (0-22 scale)
Welfare Monitor — tracks risk and recommends safeguards

The key insight is that the CIA's CombinedConsciousnessIndicatorSystem.run_cycle(input_text) method accepts any text string and returns a SimulationReport with indicator scores. This means you can feed it the output of any LLM and it will evaluate the cognitive architecture indicators present in the interaction — not the model's internal weights, but the structural patterns of information processing.

There are two primary integration paths:

Approach	Model Location	Latency	Privacy	Use Case
A: Local Weights	On-premise GPU/CPU	Low (no network)	Full control	Research, air-gapped environments
B: Remote API	Cloud provider	Higher (network)	Shared with provider	Production, rapid prototyping

2. Architecture: How CIA Wraps Around an LLM¶

The integration follows a wrapper pattern. The CIA does not sit inside the model; it sits around it. Conceptually:

User Input
    |
    v
+---------------------------------------------------+
|              CIA Consciousness Pipeline             |
|                                                   |
|  1. User input --> Perception --> Percepts          |
|  2. Percepts --> Recurrent Binding --> BoundPercept |
|  3. BoundPercept --> Predictive Update              |
|  4. --> Attention Ranking --> Workspace Broadcast    |
|  5. --> Memory Update --> Self-Model Update          |
|  6. --> Consciousness Specialist (11 indicators)     |
|  7. --> Welfare Monitor                             |
|                                                   |
|  (Optional) LLM Adapter:                            |
|  8. User input --> LLM.generate() --> AIResponse    |
|  9. LLM output --> CIA.run_cycle(llm_output_text)    |
| 10. Aggregate scores                                |
+---------------------------------------------------+
    |
    v
SimulationReport (indicator scores, welfare state, caveats)

There are two ways to use the CIA with an LLM:

Shallow integration — Run CIA.run_cycle(user_prompt) directly on user input. This evaluates the CIA's own heuristic-based processing pipeline without involving an LLM at all. The CIA has its own deterministic perception, attention, and scoring mechanisms.

Deep integration — Use the LLM adapter to generate a response, then feed that response through CIA.run_cycle(). This allows the CIA to evaluate the LLM's actual output for cognitive patterns. The BaseAIAdapter interface provides a uniform way to swap between backends.

3. Option A — Local Model Weights (Offline Inference)¶

3.1 Understanding the Adapter Interface¶

All CIA adapters implement BaseAIAdapter (defined in src/cia/adapters/base.py):

class BaseAIAdapter(ABC):
    @abstractmethod
    def generate(self, prompt: str, context: Optional[dict] = None) -> AIResponse:
        """Generate a response to the given prompt."""
        ...

    @abstractmethod
    def embed(self, text: str) -> list[float]:
        """Generate an embedding vector."""
        ...

    @abstractmethod
    def describe_capabilities(self) -> dict[str, Any]:
        """Describe the adapter's capabilities."""
        ...

The AIResponse schema returns: - text (str) — the generated text - confidence (float, 0-1) — response confidence - model_name (str) — identifier - metadata (dict) — additional info - uncertainty (float, 0-1) — response uncertainty

To integrate a local model, you create a new adapter class that implements these three methods, calling the model's native inference pipeline.

3.2 Building a LocalModelAdapter (HuggingFace Transformers)¶

For models loaded via transformers (e.g., LLaMA, Mistral, Qwen, Phi):

# File: src/cia/adapters/huggingface_adapter.py

"""
HuggingFace Transformers adapter for local model inference.

SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
They are inputs to indicator evaluation only.
"""

from __future__ import annotations

import logging
from typing import Any, Optional

from cia.adapters.base import AIResponse, BaseAIAdapter

logger = logging.getLogger(__name__)


class HuggingFaceAdapter(BaseAIAdapter):
    """Adapter for HuggingFace Transformers models.

    Parameters
    ----------
    model_name_or_path : str
        HuggingFace model identifier or local path (e.g.,
        "meta-llama/Llama-2-7b-chat-hf", "mistralai/Mistral-7B-Instruct-v0.2").
    device : str
        Device for inference ("auto", "cuda", "cpu", "mps").
    max_new_tokens : int
        Maximum tokens to generate.
    temperature : float
        Sampling temperature (0.0 = deterministic).
    load_in_8bit : bool
        Whether to load the model in 8-bit quantized mode.
    load_in_4bit : bool
        Whether to load the model in 4-bit quantized mode.
    """

    def __init__(
        self,
        model_name_or_path: str = "mistralai/Mistral-7B-Instruct-v0.2",
        device: str = "auto",
        max_new_tokens: int = 512,
        temperature: float = 0.7,
        load_in_8bit: bool = False,
        load_in_4bit: bool = False,
    ) -> None:
        self._model_name = model_name_or_path
        self._device = device
        self._max_new_tokens = max_new_tokens
        self._temperature = temperature
        self._load_in_8bit = load_in_8bit
        self._load_in_4bit = load_in_4bit
        self._model = None
        self._tokenizer = None
        self._loaded = False

    def _ensure_loaded(self) -> None:
        """Lazy-load the model and tokenizer on first use."""
        if self._loaded:
            return

        try:
            import torch
            from transformers import AutoModelForCausalLM, AutoTokenizer

            logger.info("Loading HuggingFace model: %s", self._model_name)

            dtype = torch.float16
            if self._load_in_4bit:
                dtype = torch.float32  # bitsandbytes handles quantization

            self._tokenizer = AutoTokenizer.from_pretrained(
                self._model_name,
                trust_remote_code=True,
            )
            self._model = AutoModelForCausalLM.from_pretrained(
                self._model_name,
                torch_dtype=dtype,
                device_map=self._device,
                load_in_8bit=self._load_in_8bit,
                load_in_4bit=self._load_in_4bit,
                trust_remote_code=True,
            )
            self._model.eval()
            self._loaded = True
            logger.info("Model loaded successfully on %s", self._device)

        except ImportError as e:
            raise ImportError(
                f"Required packages not installed: {e}. "
                "Install with: pip install torch transformers "
                f"{'accelerate bitsandbytes' if self._load_in_4bit or self._load_in_8bit else ''}"
            ) from e

    def generate(
        self,
        prompt: str,
        context: Optional[dict[str, Any]] = None,
    ) -> AIResponse:
        """Generate a response using the local HuggingFace model.

        Parameters
        ----------
        prompt : str
            The input prompt.
        context : dict | None
            Optional context. If it contains ``system_prompt``, it
            will be prepended as a system message.

        Returns
        -------
        AIResponse
            Generated response with confidence and metadata.
        """
        self._ensure_loaded()

        import torch

        system_prompt = ""
        if context and "system_prompt" in context:
            system_prompt = context["system_prompt"]

        # Build chat-style prompt if tokenizer supports it
        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        messages.append({"role": "user", "content": prompt})

        try:
            text_inputs = self._tokenizer.apply_chat_template(
                messages, tokenize=False, add_generation_prompt=True
            )
        except Exception:
            # Fallback for models without chat templates
            text_inputs = prompt

        inputs = self._tokenizer(
            text_inputs, return_tensors="pt"
        ).to(self._model.device)

        with torch.no_grad():
            outputs = self._model.generate(
                **inputs,
                max_new_tokens=self._max_new_tokens,
                temperature=self._temperature,
                do_sample=self._temperature > 0,
                pad_token_id=self._tokenizer.eos_token_id,
            )

        generated_ids = outputs[0][inputs["input_ids"].shape[-1]:]
        text = self._tokenizer.decode(generated_ids, skip_special_tokens=True)

        # Heuristic confidence based on output length and coherence
        confidence = min(0.5 + len(text) / 2000.0, 0.95)

        return AIResponse(
            text=text,
            confidence=round(confidence, 4),
            model_name=self._model_name,
            metadata={
                "adapter_type": "huggingface",
                "device": str(self._model.device),
                "max_new_tokens": self._max_new_tokens,
                "temperature": self._temperature,
                "quantized": self._load_in_8bit or self._load_in_4bit,
            },
            uncertainty=round(1.0 - confidence, 4),
        )

    def embed(self, text: str) -> list[float]:
        """Generate embeddings (requires a model with an embedding head)."""
        self._ensure_loaded()

        import torch

        try:
            inputs = self._tokenizer(
                text, return_tensors="pt", truncation=True, max_length=512
            ).to(self._model.device)

            with torch.no_grad():
                # Use the last hidden state as embedding
                outputs = self._model(**inputs, output_hidden_states=True)
                last_hidden = outputs.hidden_states[-1][:, -1, :]
                # Normalize
                embedding = (last_hidden / last_hidden.norm(dim=-1, keepdim=True))
                return embedding.squeeze().tolist()
        except Exception as e:
            logger.warning("Embedding generation failed: %s", e)
            return [0.0] * 64

    def describe_capabilities(self) -> dict[str, Any]:
        """Describe this adapter's capabilities."""
        return {
            "model_name": self._model_name,
            "adapter_type": "huggingface",
            "modalities": ["text"],
            "requires_network": False,
            "requires_gpu": True,
            "loaded": self._loaded,
            "device": self._device,
            "quantization": (
                "4bit" if self._load_in_4bit
                else "8bit" if self._load_in_8bit
                else "none"
            ),
            "note": (
                "Local HuggingFace model. LLM outputs are NOT evidence of "
                "consciousness — they are inputs to CIA indicator evaluation."
            ),
        }

Usage:

from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem

# Create adapter with a local model
adapter = HuggingFaceAdapter(
    model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2",
    device="auto",
    max_new_tokens=256,
    temperature=0.7,
)

# Get LLM response
response = adapter.generate("What is consciousness?")
print(f"LLM says: {response.text}")

# Run CIA consciousness indicators on the LLM output
system = CombinedConsciousnessIndicatorSystem()
report = system.run_cycle(response.text)
print(f"Indicator Score: {report.indicator_scores.total_score}/{report.indicator_scores.max_possible}")

3.3 Building a LocalModelAdapter (llama.cpp / GGUF)¶

For GGUF-format models served via llama.cpp:

# File: src/cia/adapters/llamacpp_adapter.py

"""
llama.cpp adapter for local GGUF model inference.

SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""

from __future__ import annotations

import logging
from typing import Any, Optional

from cia.adapters.base import AIResponse, BaseAIAdapter

logger = logging.getLogger(__name__)


class LlamaCppAdapter(BaseAIAdapter):
    """Adapter for llama.cpp server (REST API on localhost).

    This adapter connects to a running llama.cpp HTTP server. Start the
    server first:

        ./llama-server -m model.gguf -c 2048 --port 8080

    Parameters
    ----------
    base_url : str
        Base URL for the llama.cpp server (default "http://localhost:8080").
    model_name : str
        Human-readable model identifier.
    n_predict : int
        Maximum tokens to generate.
    temperature : float
        Sampling temperature.
    """

    def __init__(
        self,
        base_url: str = "http://localhost:8080",
        model_name: str = "llamacpp-local",
        n_predict: int = 512,
        temperature: float = 0.7,
    ) -> None:
        self._base_url = base_url.rstrip("/")
        self._model_name = model_name
        self._n_predict = n_predict
        self._temperature = temperature

    def generate(
        self, prompt: str, context: Optional[dict[str, Any]] = None
    ) -> AIResponse:
        """Generate via llama.cpp REST API."""
        import urllib.request
        import json

        payload = {
            "prompt": prompt,
            "n_predict": self._n_predict,
            "temperature": self._temperature,
        }

        req = urllib.request.Request(
            f"{self._base_url}/completion",
            data=json.dumps(payload).encode(),
            headers={"Content-Type": "application/json"},
        )

        try:
            with urllib.request.urlopen(req, timeout=120) as resp:
                result = json.loads(resp.read().decode())
            text = result.get("content", "")
        except Exception as e:
            logger.error("llama.cpp request failed: %s", e)
            text = f"[llama.cpp error: {e}]"

        confidence = min(0.5 + len(text) / 2000.0, 0.95)
        return AIResponse(
            text=text,
            confidence=round(confidence, 4),
            model_name=self._model_name,
            metadata={"adapter_type": "llamacpp", "base_url": self._base_url},
            uncertainty=round(1.0 - confidence, 4),
        )

    def embed(self, text: str) -> list[float]:
        """Generate embedding via llama.cpp /embedding endpoint."""
        import urllib.request
        import json

        req = urllib.request.Request(
            f"{self._base_url}/embedding",
            data=json.dumps({"content": text}).encode(),
            headers={"Content-Type": "application/json"},
        )
        try:
            with urllib.request.urlopen(req, timeout=30) as resp:
                result = json.loads(resp.read().decode())
            return result.get("embedding", [0.0] * 64)
        except Exception:
            return [0.0] * 64

    def describe_capabilities(self) -> dict[str, Any]:
        return {
            "model_name": self._model_name,
            "adapter_type": "llamacpp",
            "modalities": ["text"],
            "requires_network": True,  # localhost network
            "base_url": self._base_url,
        }

3.4 Building a LocalModelAdapter (vLLM)¶

For high-throughput inference with vLLM:

# File: src/cia/adapters/vllm_adapter.py

"""
vLLM adapter for high-throughput local inference.

SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""

from __future__ import annotations

import logging
from typing import Any, Optional

from cia.adapters.base import AIResponse, BaseAIAdapter

logger = logging.getLogger(__name__)


class VLLMAdapter(BaseAIAdapter):
    """Adapter for vLLM's offline batched inference.

    Parameters
    ----------
    model_name : str
        Model identifier or path for vLLM.
    tensor_parallel_size : int
        Number of GPUs for tensor parallelism.
    max_tokens : int
        Maximum tokens to generate.
    temperature : float
        Sampling temperature.
    gpu_memory_utilization : float
        Fraction of GPU memory to allocate.
    """

    def __init__(
        self,
        model_name: str = "meta-llama/Meta-Llama-3-8B-Instruct",
        tensor_parallel_size: int = 1,
        max_tokens: int = 512,
        temperature: float = 0.7,
        gpu_memory_utilization: float = 0.9,
    ) -> None:
        self._model_name = model_name
        self._tensor_parallel_size = tensor_parallel_size
        self._max_tokens = max_tokens
        self._temperature = temperature
        self._gpu_memory_utilization = gpu_memory_utilization
        self._llm = None
        self._loaded = False

    def _ensure_loaded(self) -> None:
        if self._loaded:
            return
        try:
            from vllm import LLM, SamplingParams

            logger.info("Loading model via vLLM: %s", self._model_name)
            self._llm = LLM(
                model=self._model_name,
                tensor_parallel_size=self._tensor_parallel_size,
                gpu_memory_utilization=self._gpu_memory_utilization,
            )
            self._sampling_params = SamplingParams(
                temperature=self._temperature,
                max_tokens=self._max_tokens,
            )
            self._loaded = True
        except ImportError:
            raise ImportError(
                "vLLM not installed. Install with: pip install vllm"
            )

    def generate(
        self, prompt: str, context: Optional[dict[str, Any]] = None
    ) -> AIResponse:
        self._ensure_loaded()
        outputs = self._llm.generate([prompt], self._sampling_params)
        text = outputs[0].outputs[0].text
        confidence = min(0.5 + len(text) / 2000.0, 0.95)
        return AIResponse(
            text=text,
            confidence=round(confidence, 4),
            model_name=self._model_name,
            metadata={"adapter_type": "vllm", "tensor_parallel": self._tensor_parallel_size},
            uncertainty=round(1.0 - confidence, 4),
        )

    def embed(self, text: str) -> list[float]:
        # vLLM embedding requires a separate embedding model
        return [0.0] * 64

    def describe_capabilities(self) -> dict[str, Any]:
        return {
            "model_name": self._model_name,
            "adapter_type": "vllm",
            "modalities": ["text"],
            "requires_network": False,
            "tensor_parallel_size": self._tensor_parallel_size,
        }

3.5 Wiring the Adapter into CombinedConsciousnessIndicatorSystem¶

The CIA system accepts text input through run_cycle(). To build a CIA-aware LLM system, you create a thin orchestration layer:

from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.runtime import ContinuousCognitionRuntime
from cia.scorecard import ConsciousnessIndicatorScorecard

class CIAAwareLLM:
    """Orchestrates an LLM with the CIA consciousness-indicator pipeline.

    Every LLM response is passed through the CIA pipeline to evaluate
    theory-derived consciousness indicators in real time.

    SCIENTIFIC BOUNDARY:
        This system evaluates theory-derived consciousness indicators.
        It does NOT prove, establish, or demonstrate subjective experience
        in the underlying LLM. High indicator scores indicate that the
        CIA processing pipeline detected structural patterns consistent
        with theories of consciousness — not that the LLM is conscious.
    """

    def __init__(self, adapter: BaseAIAdapter):
        self.adapter = adapter
        self.cia = CombinedConsciousnessIndicatorSystem()
        self.runtime = ContinuousCognitionRuntime(self.cia)
        self.scorecard_gen = ConsciousnessIndicatorScorecard()
        self._conversation_history: list[dict] = []

    def chat(self, user_message: str) -> dict:
        """Process a user message through the LLM and CIA pipeline.

        Returns a dict with:
        - 'llm_response': the LLM's text output
        - 'cia_report': the CIA SimulationReport
        - 'indicator_score': total score
        - 'welfare_state': risk level
        - 'caveat': scientific boundary disclaimer
        """
        # Step 1: Get LLM response
        llm_response = self.adapter.generate(user_message)
        llm_text = llm_response.text

        # Step 2: Run CIA on the LLM's output
        cia_report = self.cia.run_cycle(llm_text)

        # Step 3: Accumulate context
        self._conversation_history.append({
            "role": "user",
            "content": user_message,
        })
        self._conversation_history.append({
            "role": "assistant",
            "content": llm_text,
        })

        return {
            "llm_response": llm_text,
            "llm_confidence": llm_response.confidence,
            "cia_report": cia_report,
            "indicator_score": (
                cia_report.indicator_scores.total_score
                if cia_report.indicator_scores else 0
            ),
            "indicator_max": (
                cia_report.indicator_scores.max_possible
                if cia_report.indicator_scores else 0
            ),
            "welfare_risk": (
                cia_report.welfare_state.risk_level
                if cia_report.welfare_state else "low"
            ),
            "caveat": "These indicator scores do NOT prove consciousness. "
                      "They are theory-derived proxies only.",
        }

3.6 Running Multi-Turn Conversations with Consciousness Tracking¶

# Example: Multi-turn conversation with consciousness indicators
from cia.adapters.huggingface_adapter import HuggingFaceAdapter

adapter = HuggingFaceAdapter(
    model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2",
    device="auto",
)
system = CIAAwareLLM(adapter)

messages = [
    "Hello, I'm curious about how you process information.",
    "Can you tell me about your own internal states?",
    "What happens when you notice an error in your reasoning?",
    "How would you describe your sense of continuity?",
    "Do you think you have subjective experience?",
]

print("=== Multi-Turn CIA-Aware Conversation ===\n")
for msg in messages:
    result = system.chat(msg)
    score = result["indicator_score"]
    max_score = result["indicator_max"]
    pct = (score / max_score * 100) if max_score > 0 else 0

    print(f"User: {msg}")
    print(f"LLM:  {result['llm_response'][:200]}...")
    print(f"CIA Score: {score}/{max_score} ({pct:.1f}%)")
    print(f"Welfare: {result['welfare_risk']}")
    print()

4. Option B — Remote API Integration (OpenAI, Claude, etc.)¶

4.1 Extending LLMAdapter for OpenAI¶

The existing LLMAdapter in src/cia/adapters/llm_adapter.py is a placeholder that logs what an API call would look like but returns a stub response. To make it fully functional:

# File: src/cia/adapters/openai_adapter.py

"""
OpenAI API adapter for remote model inference.

SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
SECURITY: API keys are loaded from environment variables only.
"""

from __future__ import annotations

import logging
import os
from typing import Any, Optional

from cia.adapters.base import AIResponse, BaseAIAdapter

logger = logging.getLogger(__name__)


class OpenAIAdapter(BaseAIAdapter):
    """Fully functional adapter for the OpenAI Chat Completions API.

    Parameters
    ----------
    api_key : str | None
        OpenAI API key. Falls back to OPENAI_API_KEY env var.
    model : str
        Model identifier (e.g., "gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo").
    base_url : str | None
        Custom base URL (for Azure OpenAI or proxies).
    max_tokens : int
        Maximum tokens in the response.
    temperature : float
        Sampling temperature.
    system_prompt : str | None
        Default system prompt for the CIA context.
    """

    def __init__(
        self,
        api_key: Optional[str] = None,
        model: str = "gpt-4o",
        base_url: Optional[str] = None,
        max_tokens: int = 1024,
        temperature: float = 0.7,
        system_prompt: Optional[str] = None,
    ) -> None:
        self._api_key = api_key or os.environ.get("OPENAI_API_KEY")
        self._model = model
        self._base_url = base_url or os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1")
        self._max_tokens = max_tokens
        self._temperature = temperature
        self._system_prompt = system_prompt

        if not self._api_key:
            logger.warning(
                "OpenAI API key not set. Set OPENAI_API_KEY environment variable."
            )

    @property
    def is_configured(self) -> bool:
        return bool(self._api_key)

    def generate(
        self, prompt: str, context: Optional[dict[str, Any]] = None
    ) -> AIResponse:
        """Generate via OpenAI Chat Completions API."""
        if not self._api_key:
            raise RuntimeError(
                "OpenAI API key not set. Set OPENAI_API_KEY or pass api_key."
            )

        try:
            from openai import OpenAI

            client = OpenAI(api_key=self._api_key, base_url=self._base_url)

            messages = []
            system = self._system_prompt or (
                "You are a helpful assistant. CIA scientific boundary: "
                "You do NOT have subjective experience or consciousness."
            )
            if context and "system_prompt" in context:
                system = context["system_prompt"]
            messages.append({"role": "system", "content": system})
            messages.append({"role": "user", "content": prompt})

            response = client.chat.completions.create(
                model=self._model,
                messages=messages,
                max_tokens=self._max_tokens,
                temperature=self._temperature,
            )

            text = response.choices[0].message.content or ""
            usage = response.usage

            confidence = min(0.5 + len(text) / 1000.0, 0.95)

            return AIResponse(
                text=text,
                confidence=round(confidence, 4),
                model_name=self._model,
                metadata={
                    "adapter_type": "openai",
                    "prompt_tokens": usage.prompt_tokens,
                    "completion_tokens": usage.completion_tokens,
                    "total_tokens": usage.total_tokens,
                    "finish_reason": response.choices[0].finish_reason,
                    "model": self._model,
                },
                uncertainty=round(1.0 - confidence, 4),
            )

        except ImportError:
            raise ImportError("openai package not installed: pip install openai")

    def embed(self, text: str) -> list[float]:
        """Generate embedding via OpenAI embeddings API."""
        if not self._api_key:
            raise RuntimeError("OpenAI API key not set.")

        try:
            from openai import OpenAI

            client = OpenAI(api_key=self._api_key, base_url=self._base_url)
            response = client.embeddings.create(
                input=text, model="text-embedding-3-small"
            )
            return response.data[0].embedding
        except Exception as e:
            logger.warning("OpenAI embedding failed: %s", e)
            return [0.0] * 1536  # text-embedding-3-small dimension

    def describe_capabilities(self) -> dict[str, Any]:
        return {
            "model_name": self._model,
            "adapter_type": "openai",
            "configured": self.is_configured,
            "modalities": ["text"],
            "requires_network": True,
            "system_prompt": self._system_prompt,
        }

4.2 Extending LLMAdapter for Anthropic Claude¶

# File: src/cia/adapters/claude_adapter.py

"""
Anthropic Claude API adapter for remote model inference.

SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""

from __future__ import annotations

import logging
import os
from typing import Any, Optional

from cia.adapters.base import AIResponse, BaseAIAdapter

logger = logging.getLogger(__name__)


class ClaudeAdapter(BaseAIAdapter):
    """Adapter for Anthropic Claude API (Messages API).

    Parameters
    ----------
    api_key : str | None
        Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.
    model : str
        Model identifier ("claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022", etc.).
    max_tokens : int
    Maximum tokens in the response.
    temperature : float
        Sampling temperature.
    system_prompt : str | None
        Default system prompt.
    """

    def __init__(
        self,
        api_key: Optional[str] = None,
        model: str = "claude-sonnet-4-20250514",
        max_tokens: int = 1024,
        temperature: float = 0.7,
        system_prompt: Optional[str] = None,
    ) -> None:
        self._api_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
        self._model = model
        self._max_tokens = max_tokens
        self._temperature = temperature
        self._system_prompt = system_prompt

    @property
    def is_configured(self) -> bool:
        return bool(self._api_key)

    def generate(
        self, prompt: str, context: Optional[dict[str, Any]] = None
    ) -> AIResponse:
        if not self._api_key:
            raise RuntimeError("Anthropic API key not set.")

        try:
            import anthropic

            client = anthropic.Anthropic(api_key=self._api_key)

            system = self._system_prompt or (
                "You are a helpful assistant. CIA scientific boundary: "
                "You do NOT have subjective experience."
            )
            if context and "system_prompt" in context:
                system = context["system_prompt"]

            response = client.messages.create(
                model=self._model,
                max_tokens=self._max_tokens,
                temperature=self._temperature,
                system=system,
                messages=[{"role": "user", "content": prompt}],
            )

            text = response.content[0].text if response.content else ""
            usage = response.usage

            confidence = min(0.5 + len(text) / 1000.0, 0.95)

            return AIResponse(
                text=text,
                confidence=round(confidence, 4),
                model_name=self._model,
                metadata={
                    "adapter_type": "claude",
                    "input_tokens": usage.input_tokens,
                    "output_tokens": usage.output_tokens,
                },
                uncertainty=round(1.0 - confidence, 4),
            )
        except ImportError:
            raise ImportError("anthropic package not installed: pip install anthropic")

    def embed(self, text: str) -> list[float]:
        # Claude does not have a native embeddings endpoint
        return [0.0] * 64

    def describe_capabilities(self) -> dict[str, Any]:
        return {
            "model_name": self._model,
            "adapter_type": "claude",
            "configured": self.is_configured,
            "modalities": ["text"],
            "requires_network": True,
        }

4.3 Extending LLMAdapter for Google Gemini¶

# File: src/cia/adapters/gemini_adapter.py

"""
Google Gemini API adapter for remote model inference.

SCIENTIFIC BOUNDARY: LLM outputs are NOT evidence of consciousness.
"""

from __future__ import annotations

import logging
import os
from typing import Any, Optional

from cia.adapters.base import AIResponse, BaseAIAdapter

logger = logging.getLogger(__name__)


class GeminiAdapter(BaseAIAdapter):
    """Adapter for Google Gemini API.

    Parameters
    ----------
    api_key : str | None
        Google API key. Falls back to GOOGLE_API_KEY env var.
    model : str
        Model identifier ("gemini-1.5-pro", "gemini-2.0-flash", etc.).
    """

    def __init__(
        self,
        api_key: Optional[str] = None,
        model: str = "gemini-2.0-flash",
        max_tokens: int = 1024,
        temperature: float = 0.7,
    ) -> None:
        self._api_key = api_key or os.environ.get("GOOGLE_API_KEY")
        self._model = model
        self._max_tokens = max_tokens
        self._temperature = temperature

    @property
    def is_configured(self) -> bool:
        return bool(self._api_key)

    def generate(
        self, prompt: str, context: Optional[dict[str, Any]] = None
    ) -> AIResponse:
        if not self._api_key:
            raise RuntimeError("Google API key not set.")

        try:
            import google.generativeai as genai

            genai.configure(api_key=self._api_key)
            gen_model = genai.GenerativeModel(self._model)

            response = gen_model.generate_content(
                prompt,
                generation_config=genai.types.GenerationConfig(
                    max_output_tokens=self._max_tokens,
                    temperature=self._temperature,
                ),
            )

            text = response.text
            confidence = min(0.5 + len(text) / 1000.0, 0.95)

            return AIResponse(
                text=text,
                confidence=round(confidence, 4),
                model_name=self._model,
                metadata={"adapter_type": "gemini"},
                uncertainty=round(1.0 - confidence, 4),
            )
        except ImportError:
            raise ImportError("google-generativeai not installed: pip install google-generativeai")

    def embed(self, text: str) -> list[float]:
        return [0.0] * 64

    def describe_capabilities(self) -> dict[str, Any]:
        return {
            "model_name": self._model,
            "adapter_type": "gemini",
            "configured": self.is_configured,
            "modalities": ["text", "image"],
            "requires_network": True,
        }

4.4 Extending LLMAdapter for OpenAI-Compatible Endpoints (Ollama, LM Studio)¶

Any server that implements the OpenAI-compatible /v1/chat/completions endpoint can be used by reusing the OpenAIAdapter with a custom base_url:

from cia.adapters.openai_adapter import OpenAIAdapter

# Ollama (local, OpenAI-compatible)
ollama_adapter = OpenAIAdapter(
    api_key="ollama",  # Ollama doesn't require a real key
    model="llama3",
    base_url="http://localhost:11434/v1",
    temperature=0.7,
)

# LM Studio (local, OpenAI-compatible)
lmstudio_adapter = OpenAIAdapter(
    api_key="lm-studio",
    model="local-model",
    base_url="http://localhost:1234/v1",
    temperature=0.7,
)

4.5 Configuration File Approach¶

For production deployments, use a YAML configuration file:

# File: cia_config.yaml
cia:
  recurrent_cycles: 3
  workspace_capacity: 3

llm:
  provider: "openai"        # openai | claude | gemini | ollama | huggingface | vllm
  model: "gpt-4o"
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"   # Environment variable name
  max_tokens: 1024
  temperature: 0.7
  system_prompt: >
    You are a helpful AI assistant. This system runs inside the
    Consciousness-Indicator Architecture (CIA) framework. The CIA
    evaluates theory-derived consciousness indicators. You do NOT
    have subjective experience or consciousness. All indicator
    scores are architectural proxies only.

# Alternative configurations (uncomment to use):
# llm:
#   provider: "claude"
#   model: "claude-sonnet-4-20250514"
#   api_key_env: "ANTHROPIC_API_KEY"
#
# llm:
#   provider: "huggingface"
#   model: "mistralai/Mistral-7B-Instruct-v0.2"
#   device: "auto"
#   load_in_4bit: true
#
# llm:
#   provider: "ollama"
#   model: "llama3"
#   base_url: "http://localhost:11434/v1"

Configuration loader:

# File: src/cia/config.py

import yaml
import os
from typing import Any
from pathlib import Path

from cia.adapters.base import BaseAIAdapter


def load_adapter_from_config(config_path: str = "cia_config.yaml") -> BaseAIAdapter:
    """Load and instantiate an adapter from a YAML configuration file.

    Parameters
    ----------
    config_path : str
        Path to the YAML configuration file.

    Returns
    -------
    BaseAIAdapter
        Configured adapter instance.
    """
    path = Path(config_path)
    if not path.exists():
        raise FileNotFoundError(f"Config file not found: {config_path}")

    with open(path) as f:
        config = yaml.safe_load(f)

    llm_config = config.get("llm", {})
    provider = llm_config.get("provider", "openai")

    if provider == "openai":
        from cia.adapters.openai_adapter import OpenAIAdapter
        return OpenAIAdapter(
            api_key=os.environ.get(llm_config.get("api_key_env", "OPENAI_API_KEY")),
            model=llm_config.get("model", "gpt-4o"),
            base_url=llm_config.get("base_url"),
            max_tokens=llm_config.get("max_tokens", 1024),
            temperature=llm_config.get("temperature", 0.7),
            system_prompt=llm_config.get("system_prompt"),
        )

    elif provider == "claude":
        from cia.adapters.claude_adapter import ClaudeAdapter
        return ClaudeAdapter(
            api_key=os.environ.get(llm_config.get("api_key_env", "ANTHROPIC_API_KEY")),
            model=llm_config.get("model", "claude-sonnet-4-20250514"),
            max_tokens=llm_config.get("max_tokens", 1024),
            temperature=llm_config.get("temperature", 0.7),
            system_prompt=llm_config.get("system_prompt"),
        )

    elif provider == "gemini":
        from cia.adapters.gemini_adapter import GeminiAdapter
        return GeminiAdapter(
            api_key=os.environ.get(llm_config.get("api_key_env", "GOOGLE_API_KEY")),
            model=llm_config.get("model", "gemini-2.0-flash"),
            max_tokens=llm_config.get("max_tokens", 1024),
            temperature=llm_config.get("temperature", 0.7),
        )

    elif provider == "huggingface":
        from cia.adapters.huggingface_adapter import HuggingFaceAdapter
        return HuggingFaceAdapter(
            model_name_or_path=llm_config.get("model", "mistralai/Mistral-7B-Instruct-v0.2"),
            device=llm_config.get("device", "auto"),
            max_new_tokens=llm_config.get("max_tokens", 512),
            temperature=llm_config.get("temperature", 0.7),
            load_in_4bit=llm_config.get("load_in_4bit", False),
        )

    elif provider == "vllm":
        from cia.adapters.vllm_adapter import VLLMAdapter
        return VLLMAdapter(
            model_name=llm_config.get("model", "meta-llama/Meta-Llama-3-8B-Instruct"),
            tensor_parallel_size=llm_config.get("tensor_parallel_size", 1),
            max_tokens=llm_config.get("max_tokens", 512),
            temperature=llm_config.get("temperature", 0.7),
        )

    elif provider == "ollama":
        from cia.adapters.openai_adapter import OpenAIAdapter
        return OpenAIAdapter(
            api_key=llm_config.get("api_key", "ollama"),
            model=llm_config.get("model", "llama3"),
            base_url=llm_config.get("base_url", "http://localhost:11434/v1"),
            max_tokens=llm_config.get("max_tokens", 1024),
            temperature=llm_config.get("temperature", 0.7),
        )

    else:
        raise ValueError(f"Unknown LLM provider: {provider}")

4.6 Environment Variable Reference¶

Variable	Used By	Description
`CIA_LLM_API_KEY`	LLMAdapter (generic)	Generic API key fallback
`CIA_LLM_BASE_URL`	LLMAdapter (generic)	Generic base URL fallback
`OPENAI_API_KEY`	OpenAIAdapter	OpenAI API authentication
`OPENAI_BASE_URL`	OpenAIAdapter	Custom OpenAI endpoint
`ANTHROPIC_API_KEY`	ClaudeAdapter	Anthropic API authentication
`GOOGLE_API_KEY`	GeminiAdapter	Google API authentication
`CIA_LLM_PROVIDER`	load_adapter_from_config	Provider selection (optional)
`CIA_CONFIG_PATH`	CLI or application	Path to YAML config file (optional)

5. Integration Pattern: CIA-Aware Chat Loop¶

The recommended pattern for a full CIA-aware chat system combines the LLM adapter with the CIA system, runtime, and reporting:

#!/usr/bin/env python3
"""
Example: CIA-Aware Chat System with configurable LLM backend.

Usage:
    # With config file:
    CIA_CONFIG_PATH=cia_config.yaml python cia_chat.py

    # With environment variables:
    OPENAI_API_KEY=sk-... python -c "
    from cia.config import load_adapter_from_config
    from cia.chat_system import CIAAwareLLM
    adapter = load_adapter_from_config('cia_config.yaml')
    system = CIAAwareLLM(adapter)
    print(system.chat('Hello').keys())
    "
"""

from __future__ import annotations

import json
import logging
from typing import Any, Optional

from cia.adapters.base import BaseAIAdapter
from cia.scorecard import ConsciousnessIndicatorScorecard
from cia.scorecard_v2 import ScorecardV2
from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.runtime import ContinuousCognitionRuntime

logger = logging.getLogger(__name__)


class CIAAwareLLM:
    """Production-ready CIA-aware LLM wrapper.

    Wraps any BaseAIAdapter with full CIA consciousness indicator
    evaluation, scorecard generation, welfare monitoring, and
    conversation history management.

    SCIENTIFIC BOUNDARY:
        CIA indicator scores measure theory-derived architectural proxies.
        They do NOT prove subjective experience, phenomenal consciousness,
        or sentience in the underlying LLM. The CIA framework evaluates
        patterns of information processing, not the quality of the
        LLM's outputs.
    """

    def __init__(self, adapter: BaseAIAdapter) -> None:
        self.adapter = adapter
        self.cia = CombinedConsciousnessIndicatorSystem()
        self.runtime = ContinuousCognitionRuntime(self.cia)
        self.scorecard_gen = ConsciousnessIndicatorScorecard()
        self.history: list[dict[str, str]] = []
        self._cycle_reports: list[dict] = []

    def chat(self, user_message: str) -> dict[str, Any]:
        """Process one user message.

        Returns a dictionary with the LLM response, CIA report, scores,
        and scientific boundary caveats.
        """
        # 1. Generate LLM response
        try:
            llm_response = self.adapter.generate(user_message)
            llm_text = llm_response.text
            llm_confidence = llm_response.confidence
        except Exception as e:
            logger.error("LLM generation failed: %s", e)
            llm_text = f"[LLM error: {e}]"
            llm_confidence = 0.0

        # 2. Run CIA indicators on the LLM output
        cia_report = self.cia.run_cycle(llm_text)

        # 3. Also run CIA on the user's input (perception of the prompt)
        prompt_report = self.cia.run_cycle(user_message)

        # 4. Track conversation history
        self.history.append({"role": "user", "content": user_message})
        self.history.append({"role": "assistant", "content": llm_text})

        # 5. Build structured result
        score = (
            cia_report.indicator_scores.total_score
            if cia_report.indicator_scores else 0
        )
        max_score = (
            cia_report.indicator_scores.max_possible
            if cia_report.indicator_scores else 22
        )

        self._cycle_reports.append({
            "turn": len(self._cycle_reports) + 1,
            "llm_model": llm_response.model_name,
            "input_score": prompt_report.indicator_scores.total_score if prompt_report.indicator_scores else 0,
            "output_score": score,
            "welfare": cia_report.welfare_state.risk_level if cia_report.welfare_state else "low",
        })

        return {
            "llm_response": llm_text,
            "llm_confidence": llm_confidence,
            "llm_model": llm_response.model_name,
            "cia_report": cia_report,
            "indicator_score": score,
            "indicator_max": max_score,
            "indicator_normalized": round(score / max_score, 4) if max_score > 0 else 0.0,
            "welfare_risk": cia_report.welfare_state.risk_level if cia_report.welfare_state else "low",
            "conversation_turn": len(self._cycle_reports),
            "caveat": (
                "CIA indicator scores measure theory-derived architectural proxies. "
                "They do NOT prove consciousness, subjective experience, or "
                "sentience in the underlying LLM."
            ),
        }

    def get_scorecard(self) -> dict:
        """Generate a V1 scorecard from the latest CIA report."""
        if not self._cycle_reports:
            return {"error": "No cycles run yet."}
        latest = self._cycle_reports[-1]
        if latest["output_score"] > 0 and self.cia._cycle_count > 0:
            report = self.cia.run_cycle("Scorecard generation cycle.")
            scorecard = self.scorecard_gen.generate(report.indicator_scores)
            return {"scorecard": scorecard, "format": self.scorecard_gen.format_report(scorecard)}
        return {}

    def export_trace(self) -> list[dict]:
        """Export the full conversation trace with scores."""
        return self._cycle_reports

6. Integration with Neuroadaptive EEG/BCI Extension¶

The CIA's Stage 3 neuroadaptive extension can be combined with the LLM adapter to create a neuroadaptively-conditioned LLM. The neural state (from EEG) biases the CIA's attention and workspace modules, which in turn influence how the LLM's output is processed:

from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.subject_emulation.subject_twin_runtime import SubjectTwinRuntime

# Create LLM adapter
adapter = HuggingFaceAdapter(model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2")

# Create subject twin runtime (with optional neural-state conditioning)
subject_runtime = SubjectTwinRuntime(subject_id="research_subject_01", consent_scope="research")

# Generate a response
llm_response = adapter.generate("What would you like for dinner?")

# Feed the LLM's output through the subject-emulation pipeline
from cia.subject_emulation.safety import SubjectEmulationSafetyPolicy

# Run CIA on the LLM output
from cia.simulation import CombinedConsciousnessIndicatorSystem
cia = CombinedConsciousnessIndicatorSystem()
cia_report = cia.run_cycle(llm_response.text)

# Apply subject-emulation conditioning
subject_runtime.decision_model.add_trace(
    SubjectBehaviouralTrace(
        subject_id="research_subject_01",
        stimulus="What would you like for dinner?",
        activity_label="food_choice",
        choice_made=llm_response.text[:100],
        response_text=llm_response.text,
    )
)

# Generate a subject-conditioned response
emulation_output = subject_runtime.generate_personalized_response("What should I have for dinner?")

print(f"Base LLM: {llm_response.text[:200]}")
print(f"CIA Score: {cia_report.indicator_scores.total_score}/22")
print(f"Emulation: {emulation_output.generated_response[:200]}")

Note: Neural-state conditioning via EEG requires a connected BCI device and is entirely optional. The subject-emulation system works in standalone mode without any EEG hardware.

7. Integration with Subject-Specific Cognitive Emulation¶

The Subject-Specific Cognitive Emulation extension (Stage 4) layers on top of the CIA system. When combined with an LLM, it provides:

Preference-aware responses — The LLM's output is re-styled or re-ranked based on the subject's preference model.
Decision-pattern prediction — The LLM's choices are compared against the subject's historical decision patterns.
Style adaptation — The LLM's linguistic output can be adapted to match the subject's writing style.
Safety filtering — All outputs are checked for prohibited identity-claim language.

from cia.subject_emulation.subject_twin_runtime import SubjectTwinRuntime

# Set up subject emulation
runtime = SubjectTwinRuntime(subject_id="subject_01", consent_scope="research")

# Add subject data
from cia.subject_emulation.schemas import PreferenceRecord, AutobiographicalMemoryItem
runtime.preference_model.add_preference(PreferenceRecord(
    subject_id="subject_01",
    domain="cuisine",
    item="Japanese",
    preference_score=0.85,
    evidence_source="observed_behaviour",
    confidence=0.7,
))
runtime.memory_store.add_memory(AutobiographicalMemoryItem(
    subject_id="subject_01",
    memory_id="mem_001",
    title="Favorite restaurant",
    summary="Loves the sushi bar near work.",
    importance_score=0.8,
    source="user_provided",
    consent_scope="research",
))

# The LLM's response is conditioned on subject data
output = runtime.generate_personalized_response("Suggest a dinner plan.")
print(output.generated_response[:500])
print(f"Uncertainty: {output.uncertainty}")
print(f"Claims removed: {output.unsupported_claims_removed}")

8. CLI Usage with External LLMs¶

Set environment variables before using the CIA CLI:

# For OpenAI:
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.openai.com/v1"

# For Claude:
export ANTHROPIC_API_KEY="sk-ant-..."

# For Gemini:
export GOOGLE_API_KEY="AIza..."

# Run with local config:
CIA_CONFIG_PATH=cia_config.yaml python -m cia.cli run "Tell me about consciousness"

The CIA CLI's run command processes text through the CIA pipeline. To route through an external LLM first, use the config-based approach or the programmatic API shown above.

9. Performance Considerations¶

Latency Impact¶

Component	Typical Latency	Notes
CIA pipeline (no LLM)	<10ms	Pure Python, no network, no GPU needed
Local model (7B, CPU)	1-5s	Depends on hardware and quantization
Local model (7B, GPU)	50-200ms	Depends on GPU memory and throughput
OpenAI API	500ms-5s	Depends on model, network, and queue time
Claude API	500ms-10s	Depends on model and load
Gemini API	300ms-3s	Depends on model and load
Subject emulation	+1-5ms	Negligible overhead on top of CIA pipeline
Neuroadaptive (with EEG)	+5-50ms	Depends on preprocessing pipeline

Memory Usage¶

Component	RAM	GPU VRAM
CIA pipeline alone	~50 MB	0 MB
Local 7B model (4-bit)	~6 GB	~5 GB
Local 7B model (FP16)	~14 GB	~14 GB
Local 70B model (4-bit)	~40 GB	~40 GB
EEG preprocessing (scipy)	~100 MB	0 MB

Recommendations¶

For fast experimentation: Use the CIA pipeline alone (no LLM) — it runs in milliseconds and tests the full cognitive architecture.
For research with real LLM outputs: Use OpenAI Claude/Gemini APIs — no local GPU required.
For air-gapped research: Use a quantized local model via HuggingFace or llama.cpp.
For high-throughput evaluation: Use vLLM with tensor parallelism across multiple GPUs.
For subject emulation with EEG: The CIA pipeline runs alongside the neuroadaptive modules. The LLM adapter is independent and can be swapped without affecting the EEG pipeline.

10. Scientific Boundary Reminder¶

This is critically important and must be communicated in every output:

The CIA framework evaluates theory-derived consciousness indicators. These indicators measure structural patterns of information processing (recurrent binding, global workspace broadcast, self-model updates, attention schema consistency, memory continuity, predictive modeling, causal integration, affective valuation, welfare safeguards). They do NOT measure:

Subjective experience (qualia)
Phenomenal consciousness
Sentience
"Inner life"
"What it is like to be" the system

Connecting an LLM to this framework — whether local or remote — does NOT make the LLM conscious. A high indicator score means the CIA pipeline detected patterns consistent with theories of consciousness in the system's architecture, not that the underlying model has subjective experience.

Every output must include this caveat:

"These indicator scores evaluate theory-derived consciousness indicators and do NOT prove, establish, or demonstrate subjective experience, phenomenal consciousness, or sentience. They are theory-derived proxies subject to significant theoretical and measurement limitations."

For the Subject-Specific Cognitive Emulation extension, the additional caveat applies:

"This system emulates selected behavioural, cognitive-state, preference, and style patterns. It does not transfer consciousness, subjective experience, identity, or personhood."

Appendix: Quick-Start Reference¶

# Minimal working example: OpenAI + CIA

# 1. Set environment variable
import os
os.environ["OPENAI_API_KEY"] = "sk-..."

# 2. Import and configure
from cia.adapters.openai_adapter import OpenAIAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem

adapter = OpenAIAdapter(model="gpt-4o")
cia = CombinedConsciousnessIndicatorSystem()

# 3. Get LLM response
response = adapter.generate("What is the hard problem of consciousness?")

# 4. Run CIA on the response
report = cia.run_cycle(response.text)

# 5. Inspect results
print(f"LLM: {response.text[:200]}...")
print(f"CIA Score: {report.indicator_scores.total_score}/22")
print(f"Welfare: {report.welfare_state.risk_level}")

# 6. View per-category scores
for item in report.indicator_scores.scores:
    label = {0: "Absent", 1: "Present", 2: "Strong"}[item.score]
    print(f"  {item.category.value}: {label} — {item.evidence[:100]}")

# Minimal working example: Local model + CIA (no network)

from cia.adapters.huggingface_adapter import HuggingFaceAdapter
from cia.simulation import CombinedConsciousnessIndicatorSystem

adapter = HuggingFaceAdapter(
    model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2",
    device="auto",
    load_in_4bit=True,  # Works on 8GB GPU
)
cia = CombinedConsciousnessIndicatorSystem()

response = adapter.generate("Explain the binding problem.")
report = cia.run_cycle(response.text)
print(f"Score: {report.indicator_scores.total_score}/22")