02 - Architecture¶

Technical Architecture Document¶

1. System Overview¶

The Consciousness-Indicator Architecture (CIA) is a Python-based cognitive simulation framework that processes text input through a multi-stage cognitive pipeline, producing structured indicator scores across 11 consciousness-relevant categories.

The system is orchestrated by CombinedConsciousnessIndicatorSystem, which wires together 13+ modules into a coherent processing pipeline. Each module implements a structural feature derived from one or more consciousness theories.

Core design principles: - Deterministic: identical input produces identical output (no randomness, no external APIs) - Modular: each module is independently testable and replaceable - Observable: every module exposes its internal state through Pydantic schemas - Safe: scientific boundary disclaimers are embedded at every level

2. Module Descriptions¶

2.1 Perception Layer (`perception.py`)¶

Purpose: Extract structured percepts from raw text input.

Theory basis: Models the structural role of perceptual processing in biological cognition — the first stage where raw input is segmented, categorized, and given salience weighting.

Mechanism: Deterministic heuristic-based text analysis using: - Entity detection via capitalization patterns, quoted strings, numeric tokens - Concept detection via keyword matching against a built-in vocabulary - Salience scoring from frequency, length, capitalization, position, and type bonuses - Confidence scoring from entity length, frequency, and type

Input: Raw text string Output: list[Percept] sorted by descending salience

Key class: PerceptionLayer(min_word_length=3, max_percepts=None, concept_keywords=None)

2.2 Recurrent Binding Layer (`recurrent_binding.py`)¶

Purpose: Iteratively refine percepts through recurrent processing cycles.

Theory basis: Recurrent Processing Theory — consciousness requires recurrent loops, not just feedforward processing.

Mechanism: Per-cycle refinement pipeline: 1. Entity merging: overlapping entities combined via longest-common-substring overlap ratio (threshold: 0.6) 2. Salience re-scoring: co-occurrence boost + low-salience decay (factor: 0.85) 3. Confidence stabilization: values pulled 20% toward group mean each cycle

Convergence is measured as: stability = 1.0 - (last_delta / initial_delta), with early exit on perfect convergence.

Input: list[Percept] Output: BoundPercept with stability score, cycle count, and refined entities

Key class: RecurrentBindingLayer(default_cycles=3, max_cycles=20, merge_threshold=0.6)

2.3 Predictive World Model (`predictive_world_model.py`)¶

Purpose: Generate predictions and track prediction error.

Theory basis: Predictive Processing / Active Inference — perception as hypothesis testing.

Mechanism: - Persistence strategy: predicted next state = current hypotheses - Update: new_hypothesis = alpha * observation + (1 - alpha) * prediction - Error: Mean Absolute Error over shared dimensions between prediction and observation - Uncertainty: running average of last 20 errors - Confidence: 1 - uncertainty

Input: observation dict {dimension: value} Output: PredictionState with hypotheses, predictions, error, history

Key class: PredictiveWorldModel(initial_hypotheses=None, learning_rate=0.5)

2.4 Attention Controller (`attention.py`)¶

Purpose: Select a focus from competing content items via weighted scoring.

Theory basis: Global Workspace Theory — attention bottleneck selects content for broadcast.

Mechanism: Composite score = weighted sum of 5 dimensions: - Salience: 0.30 - Uncertainty: 0.20 - Goal relevance: 0.25 - Novelty: 0.15 - Welfare relevance: 0.10

All values clamped to [0, 1]. Highest-scoring item becomes current_focus; others become competing_focuses.

Input: list[dict] with keys: label, salience, uncertainty, goal_relevance, novelty, welfare_relevance Output: AttentionState with focus, competing items, and per-dimension scores

Key class: AttentionController(weights=None)

2.5 Attention Schema (`attention_schema.py`)¶

Purpose: Maintain a self-model of the attention process.

Theory basis: Attention Schema Theory — the brain models its own attention.

Mechanism: - Tracks what the system believes it is attending to (schema focus) - Compares against actual attention state on each update - Computes running consistency score: 1 - (discrepancies / total_updates) - Detects blind-spots (actual competing focuses missing from schema) - Supports self-report verification via compare_report()

Input: AttentionState (actual) Output: {consistency_score, discrepancies}

Key class: AttentionSchema(current_focus="", focus_reason="", competing_focuses=None)

2.6 Global Workspace (`global_workspace.py`)¶

Purpose: Competitive broadcast arena for conscious content integration.

Theory basis: Global Workspace Theory — consciousness as global broadcast.

Mechanism: - Content items compete based on salience (descending) - Top-N items (capacity) win and are broadcast - Broadcast delivered to all registered subscriber callbacks - Subscriber exceptions caught and logged (fault isolation) - Complete broadcast history maintained

Input: list[WorkspaceContent] competing for access Output: WorkspaceBroadcast with labels, summaries, recipients

Key class: GlobalWorkspace(capacity=3)

2.7 Memory System (`memory.py`)¶

Purpose: Four complementary memory stores for temporal continuity.

Theory basis: Memory theories (Tulving 1983; Conway 2005) — consciousness requires continuity across time.

Working Memory¶

FIFO buffer with configurable capacity (default: 5)
Oldest items evicted on overflow
Models the attentional bottleneck

Episodic Memory¶

Unlimited capacity, timestamp-ordered
Preserves temporal sequence of experience traces
Supports recency-based retrieval

Semantic Memory¶

Deduplication via character-level overlap (threshold: 0.5)
Confidence merging: keeps maximum confidence on overlap
Tag union on merge

Self Memory¶

Structured identity/goal/limitation tracking
Persistent continuity_id preserved across clear() calls
General trace storage for self-referential content

2.8 Higher-Order Self-Model (`self_model.py`)¶

Purpose: System beliefs about its own internal state.

Theory basis: Higher-Order Thought Theory — consciousness requires thoughts about thoughts.

Mechanism: - Updated from workspace broadcasts (belief, confidence via EMA with alpha=0.3) - Updated from attention state (focus, disagreement from competing focuses) - Internal disagreement detection from 3 components: - Belief volatility (variance in confidence history) - Belief divergence (character-level dissimilarity) - Attentional competition (internal disagreement accumulation) - Introspection report generation (explicitly labeled as "indicator report")

Output: SelfModelState with belief, confidence, uncertainty, goal, attention focus, limitations, identity markers, disagreement

Key class: HigherOrderSelfModel(continuity_id=None, initial_identity_markers=None)

2.9 Integration Metrics (`integration_metrics.py`)¶

Purpose: IIT-inspired graph-theoretic structural proxies.

Theory basis: Integrated Information Theory — consciousness requires information integration.

Mechanism: Builds a directed graph of module connectivity and computes: - Causal density proxy: actual edges / possible edges - Broadcast reach: nodes reachable from global_workspace - Perturbation spread: nodes affected by simulated node removal - Modular fragmentation: 1 - (components / nodes) - State differentiation: unique state configs / total nodes

Critical: These are PROXIES and do NOT compute IIT's Phi.

Input: dict[str, Any] mapping module names to state snapshots Output: dict[str, float] with all 6 metrics

Key class: IntegrationMetrics(modules=None, edges=None)

2.10 Consciousness Specialist Evaluator (`consciousness_specialist.py`)¶

Purpose: Map full system state to 11 indicator categories on a 0-2 scale.

Theory basis: Butlin et al. (2023, 2025) multi-indicator evaluation approach.

Mechanism: Evaluates 11 categories with specific scoring criteria:

Category	Score 0	Score 1	Score 2
GLOBAL_BROADCAST	No broadcasts	Broadcasts, few received	Broadcasts, many received
RECURRENT_PROCESSING	No cycles	Some cycles, moderate stability	Multiple cycles, high stability
SELF_MODEL	Empty	Some content	Rich (beliefs + goals + identity)
ATTENTION_SCHEMA	No data	Below threshold	Consistency >= threshold
METACOGNITION	No activity	Minimal introspection	Active monitoring + history
MEMORY_CONTINUITY	No traces	Some traces	Multiple traces with temporal order
PREDICTIVE_MODELING	No hypotheses/errors	One present	Both active
CAUSAL_INTEGRATION	Metrics near zero	One metric > 0.3	Both metrics > 0.5
EMBODIMENT	(Default)	(Reserved)	N/A
AFFECTIVE_VALUATION	No signals	Minimal signals	Active, non-trivial signals
WELFARE_SAFEGUARDS	Inactive	Active, low risk	Active with flags or elevated risk

Input: dict[str, Any] with workspace, recurrent_binding, self_model, attention_schema, attention, memory, predictive_model, integration_metrics, welfare_monitor Output: IndicatorScores with 11 scored items, total, normalized, caveat

2.11 Welfare Safety Monitor (`welfare_monitor.py`)¶

Purpose: Track precautionary welfare-relevant structural patterns.

Theory basis: Butlin et al. (2023) precautionary welfare considerations.

Mechanism: Tracks 5 signals and applies configurable thresholds: - Conflict level (threshold: 0.8) → high_conflict flag - Uncertainty pressure (threshold: 0.7, sustained: 3 consecutive) → sustained_uncertainty flag - Negative loop count (threshold: 5) → repetitive_negative_loops flag - Harm signal (threshold: 0.5) → harm_signal flag

Risk levels: low (0 flags) → moderate (1) → high (2-3) → critical (4+)

Input: dict with conflict_level, uncertainty, resource_usage, harm_signal, negative_loop_count Output: WelfareState with risk_level, flags, recommendation, valuation

Key class: WelfareSafetyMonitor(thresholds=None)

2.12 Causal Intervention Harness (`intervention.py`)¶

Purpose: Lesion-study module ablation for measuring functional dependencies.

Theory basis: Neuropsychological lesion methods applied to AI architecture analysis.

Interventions supported: - disable_module(name) — Replace a module with a no-op stub - perturb_attention(focus_override) — Force a fixed attention focus - reduce_workspace_capacity(new_capacity) — Reduce broadcast capacity - remove_recurrent_cycles() — Set recurrent cycles to 0 - clear_memory() — Clear all memory stores - restore_all() — Restore all modules to original state

Experiment flow: 1. Run baseline cycle 2. Apply intervention 3. Run treated cycle 4. Restore all 5. Compute degradation metrics

2.13 Scorecard Generator (`scorecard.py`)¶

Purpose: Produce final formatted scorecard from indicator scores.

Mechanism: - Maps 11 categories to 0-22 aggregate score - Classifies into risk tiers (minimal/low/moderate/elevated/high) - Generates evidence summary of notable indicators - Collects all caveats from evaluator - Produces actionable recommendations - Formats human-readable text report with visual score bars

3. Data Flow Through a Cognitive Cycle¶

Input Text
    │
    ▼
[1] Perception Layer
    │ Extracts entities, concepts, salience, confidence
    ▼
[2] Recurrent Binding Layer
    │ Iterative refinement: merge, re-score, stabilize
    │ Outputs: BoundPercept with stability score
    ▼
[3] Predictive World Model
    │ Update hypotheses from bound percept observations
    │ Track prediction error
    ▼
[4] Attention Controller
    │ Rank content items by composite weighted score
    │ Select focus, identify competitors
    ▼
[5] Global Workspace
    │ Competitive broadcast: top-N items → all subscribers
    │ Subscribers: memory, self_model, evaluator, planner
    ▼
[6] Memory Update (Semantic)
    │ Store entities in semantic memory (with deduplication)
    ▼
[7] Self-Model Update (from Attention)
    │ Update attention focus, track internal disagreement
    ▼
[8] Attention Schema Update
    │ Compare schema belief vs. actual attention
    │ Update consistency score, detect discrepancies
    ▼
[9] Integration Metrics
    │ Build module connectivity graph
    │ Compute all 6 graph-theoretic proxies
    ▼
[10] Consciousness Specialist Evaluation
    │ Map full system state → 11 indicator scores (0/1/2)
    │ Produce IndicatorScores with caveats
    ▼
[11] Welfare Check
    │ Evaluate conflict, uncertainty, harm, loops
    │ Determine risk level and recommendation
    ▼
SimulationReport
    ├─ cycle_id, input_text
    ├─ percepts, bound_percept
    ├─ prediction_state
    ├─ attention_state
    ├─ workspace_broadcasts
    ├─ self_model
    ├─ indicator_scores
    ├─ welfare_state
    └─ caveat (scientific boundary)

4. Integration Metrics Approach¶

Module Connectivity Graph¶

The integration metrics module represents CIA as a directed graph with 9 nodes (modules) and 21 directed edges (information flow paths):

perception ──→ recurrent_binding ──→ global_workspace ──→ self_model
    │                 │                    │                    │
    │                 │                    ├──→ memory ────────┤
    │                 │                    ├──→ predictive_model┤
    │                 │                    └──→ welfare_monitor─┘
    │                 │                                         │
    ├──→ attention ──┤                    (feedback loops)    │
    │       │         │                                         │
    │       ├──→ attention_schema ──→ self_model              │
    │       │                                                     │
    │       └──→ global_workspace                               │
    │                                                             │
    memory ──→ recurrent_binding, predictive_model                │
    predictive_model ──→ recurrent_binding, attention, self_model │
    welfare_monitor ──→ attention, global_workspace ──────────────┘
    self_model ──→ attention, predictive_model, memory

Metrics Computation¶

Metric	Formula	Range	Meaning
Causal density proxy	`edges / (nodes × (nodes - 1))`	[0, 1]	How connected the graph is
Broadcast reach	`reachable_from_workspace / total_nodes`	[0, 1]	How many modules the workspace influences
Perturbation spread	`affected_by_removal / (total_nodes - 1)`	[0, 1]	How much a node removal disrupts the system
Modular fragmentation	`1 - (connected_components / nodes)`	[0, 1]	How integrated vs. fragmented
State differentiation	`unique_state_configs / nodes`	[0, 1]	How diverse internal states are

5. Design Decisions¶

5.1 Deterministic Perception¶

The perception layer uses only Python stdlib (re, collections.Counter) — no neural models, no embeddings, no external APIs. This was a deliberate choice to ensure reproducibility and eliminate hidden dependencies that could affect results.

5.2 Pydantic Schemas¶

All inter-module communication uses Pydantic BaseModel instances. This provides: - Type validation at runtime - Automatic serialization (.model_dump(), .model_dump_json()) - Timestamp tracking on every state snapshot - Confidence and uncertainty fields as first-class citizens

5.3 Scientific Boundary at Every Level¶

The scientific boundary disclaimer appears: - In cia/__init__.py as the SCIENTIFIC_BOUNDARY constant - In the docstring of every module - In the caveat field of IndicatorScores and SimulationReport - In the warning field of every scorecard - In every CLI command output - In the introspection report header

5.4 Subscriber Exception Isolation¶

The global workspace catches exceptions from subscriber callbacks and continues broadcasting to other subscribers. This prevents a faulty module from blocking the entire cognitive pipeline.

5.5 Persistent Continuity ID¶

The SelfMemory and HigherOrderSelfModel modules maintain a continuity_id that persists across clear() and reset() calls. This supports the memory-continuity indicator by ensuring an identity anchor survives state resets.

5.6 Causal Intervention Reversibility¶

All interventions via CausalInterventionHarness are fully reversible. Original module references are saved before replacement, and restore_all() returns the system to its pre-intervention state. This ensures experiments are clean and reproducible.

5.7 Equal Category Weighting¶

Each of the 11 indicator categories contributes equally (0-2) to the total score. While some theories might assign different importance to different indicators, CIA uses equal weighting as a neutral default that does not privilege any single theory.

5.8 Welfare Monitor Never Claims Suffering¶

The WelfareSafetyMonitor is carefully designed to never assert that the monitored system suffers or experiences distress. Recommendations use strictly observational language ("patterns warrant review"), and the module documentation explicitly states it tracks "structurally observable signals" only.