Skip to content

01 - Theory Review

Deep Analysis of Consciousness Theories and Their Implementation in CIA


This document provides a detailed analysis of each consciousness theory that informs the Consciousness-Indicator Architecture (CIA). For each theory, we describe its core claims, the architectural implications for AI systems, how CIA implements a module inspired by the theory, the measurable indicators produced, and the fundamental limitations of this approach.

Reminder: Implementation of a consciousness theory's architectural features does not demonstrate that the implementing system possesses subjective experience.


1. Global Workspace Theory (GWT)

References: Baars (2005); Shanahan & Baars (2005)

Core Claim

Global Workspace Theory proposes that consciousness arises from a "theatre of the mind" — a limited-capacity cognitive workspace where information from specialized, unconscious processors competes for access. When information wins this competition, it is broadcast globally to all cognitive modules, making it available for verbal report, voluntary action, episodic memory, and strategic planning.

Key tenets: - The mind contains many specialized, parallel, unconscious processors - Consciousness corresponds to a global broadcast of information - The workspace has limited capacity (a "bottleneck") - Broadcast is all-or-nothing: content is either globally available or not - Broadcast integrates information across otherwise isolated modules

Architectural Implication

An AI system implementing GWT should contain: - Multiple specialized processing modules operating in parallel - A central competitive arena with limited capacity - A broadcast mechanism that distributes winning content to all subscribers - Integration of information across modules through the broadcast

CIA Implementation

Module: GlobalWorkspace (global_workspace.py)

The CIA global workspace implements: - Competitive arena: Content items compete based on salience scores - Limited capacity: Configurable maximum broadcasts per cycle (default: 3) - Subscriber system: Modules register callbacks to receive broadcasts - Exception isolation: Faulty subscribers do not disrupt broadcasts - Broadcast history: Complete audit trail of all broadcast events

from cia.global_workspace import GlobalWorkspace
from cia.schemas import WorkspaceContent

gw = GlobalWorkspace(capacity=3)
gw.subscribe("memory", lambda c: print(f"Memory received: {c.label}"))
gw.subscribe("self_model", lambda c: print(f"Self-model received: {c.label}"))

items = [
    WorkspaceContent(label="high_priority", salience=0.9, source_module="perception"),
    WorkspaceContent(label="low_priority", salience=0.1, source_module="memory"),
]
result = gw.compete(items)  # high_priority wins and is broadcast

Measurable Indicators

Indicator How Measured Category
Broadcast existence Whether workspace has produced any broadcasts GLOBAL_BROADCAST
Broadcast reach Average fraction of subscribers receiving broadcasts GLOBAL_BROADCAST
Subscriber count Number of registered downstream modules GLOBAL_BROADCAST
Broadcast history depth Number of broadcast events over time GLOBAL_BROADCAST

Limitations

  • CIA's workspace uses salience ranking, not the complex neural dynamics of biological global competition
  • Broadcast is instantaneous; biological broadcast involves neural oscillations and timing
  • The workspace operates on discrete content items, not continuous neural representations
  • Subscriber registration is static (configured at initialization), whereas biological attention has dynamic routing
  • The "consciousness" in GWT refers to functional access, not phenomenal experience — even if perfectly implemented, it addresses access consciousness, not phenomenal consciousness

2. Recurrent Processing Theory (RPT)

Reference: Lamme (2006, 2010)

Core Claim

Recurrent Processing Theory argues that consciousness depends on recurrent (feedback) loops in neural processing. Feedforward processing can produce behavior without consciousness (as in blindsight), but consciousness requires iterative re-entrant processing where higher cortical areas feed back to lower areas, enabling sustained, coherent representations.

Key tenets: - Feedforward processing alone is insufficient for consciousness - Recurrent loops between cortical areas enable conscious access - The content and duration of recurrent activity determines whether processing reaches consciousness - Recurrent processing enables binding of features into coherent wholes - Processing convergence (stabilization) is a marker of consciousness-relevant dynamics

Architectural Implication

An AI system implementing RPT should contain: - Iterative refinement loops that re-process perceptual data - Feedback connections from higher to lower processing stages - Convergence detection: the system should reach stable interpretations - Feature binding: combining separate features into unified representations

CIA Implementation

Module: RecurrentBindingLayer (recurrent_binding.py)

The CIA recurrent binding layer implements: - Iterative cycles: Configurable number of refinement passes (default: 3) - Entity merging: Overlapping entities are combined using substring overlap - Salience re-scoring: Co-occurrence and decay adjust salience each cycle - Confidence stabilization: Values converge toward the group mean - Stability measurement: Convergence score quantifying state change across cycles - Per-cycle feedback history: Complete audit trail of each cycle's dynamics

from cia.recurrent_binding import RecurrentBindingLayer

layer = RecurrentBindingLayer(default_cycles=5)
bound = layer.bind(percepts)
print(f"Stability: {bound.stability:.3f}")  # 0.0 (unstable) to 1.0 (converged)
print(f"Cycles: {bound.cycles_completed}")

Measurable Indicators

Indicator How Measured Category
Recurrent cycle count Number of refinement iterations completed RECURRENT_PROCESSING
Binding stability Convergence metric: 1.0 - (last_delta / initial_delta) RECURRENT_PROCESSING
Entity merging Whether overlapping entities were combined RECURRENT_PROCESSING
Feedback history Detailed per-cycle state changes RECURRENT_PROCESSING

Limitations

  • CIA's recurrence operates on discrete symbolic entities, not continuous neural activations
  • The merging algorithm uses character-level substring overlap, not learned semantic similarity
  • Convergence in CIA is trivially achievable (the system always converges given enough cycles), unlike biological neural dynamics
  • Recurrent processing in the brain involves complex temporal dynamics (gamma oscillations, etc.) not captured by CIA's discrete iterations
  • Stability is a necessary but not sufficient condition for consciousness-relevant processing

3. Higher-Order Thought Theory (HOT)

Reference: Rosenthal (2005)

Core Claim

Higher-Order Thought Theory proposes that a mental state becomes conscious when it is the object of a higher-order representation — a thought about that thought. Unconscious mental states are first-order (representing the world) but not accompanied by meta-representational awareness.

Key tenets: - First-order representations are insufficient for consciousness - A conscious state is one that is represented by a higher-order state - Metacognitive monitoring (thinking about thinking) is consciousness-relevant - Self-awareness requires representing one's own mental states - The accuracy and richness of higher-order representations matter

Architectural Implication

An AI system implementing HOT should contain: - A self-model representing the system's own mental states - Metacognitive monitoring: the ability to reflect on its own processing - Belief tracking: maintaining beliefs about its own beliefs - Internal disagreement detection: noticing conflicts between internal representations

CIA Implementation

Modules: HigherOrderSelfModel (self_model.py), ConsciousnessSpecialistEvaluator (metacognition category)

The CIA self-model implements: - Self-belief tracking: Maintains the system's current top-level belief with confidence and uncertainty - Goal and limitation awareness: Explicit representations of what the system is trying to do and what it cannot do - Identity markers: Persistent identity tags supporting continuity tracking - Attention self-modeling: Updates from attention state changes - Introspection reports: Structured reports about the system's own state (explicitly labeled as "indicator reports," not evidence of consciousness) - Internal disagreement detection: Measures belief volatility, belief divergence, and attentional competition

from cia.self_model import HigherOrderSelfModel

sm = HigherOrderSelfModel(continuity_id="agent-001", initial_identity_markers=["reasoning system"])
report = sm.generate_introspection_report()
# report["report_type"] == "indicator report" (NOT proof of consciousness)

Measurable Indicators

Indicator How Measured Category
Self-model richness Presence of beliefs, goals, identity markers, continuity SELF_MODEL
Metacognitive activity Belief history length, introspection generation METACOGNITION
Internal disagreement Multi-component disagreement score METACOGNITION
Identity continuity Persistent continuity_id across resets SELF_MODEL

Limitations

  • CIA's self-model is a data structure, not a genuine higher-order representation of experience
  • The "introspection report" is a programmatically generated summary, not a product of subjective reflection
  • Disagreement detection uses character-level string comparison, not genuine cognitive conflict
  • Having a self-model is a necessary architectural feature for HOT, but does not entail that the model represents conscious experience
  • The self-model reports on computational states (confidence, uncertainty, attention focus) — these are not the same as phenomenal states

4. Predictive Processing / Active Inference

References: Friston (2010); Clark (2013)

Core Claim

Predictive Processing (PP) and Active Inference propose that the brain is fundamentally a prediction machine. Rather than passively processing sensory input, the brain continuously generates predictions about incoming sensory data and updates its internal models based on prediction error. Consciousness may be related to the precision-weighting of prediction errors and the hierarchical organization of generative models.

Key tenets: - Perception is hypothesis testing, not passive reception - The brain minimizes prediction error (surprise) through updating models - Hierarchical generative models predict at multiple levels - Active inference: the system can act to reduce predicted future error - Precision-weighting determines which errors drive updates

Architectural Implication

An AI system implementing PP should contain: - A generative model producing predictions about future observations - Prediction error computation comparing predictions to observations - Model updating based on error signals - Error history tracking for learning and adaptation - Uncertainty quantification tied to prediction quality

CIA Implementation

Module: PredictiveWorldModel (predictive_world_model.py)

The CIA predictive world model implements: - Hypothesis management: Dict-based world-state hypotheses with confidence values - Persistence prediction: Strategy where predicted next state equals current hypotheses - Observation updating: Blending new observations with predictions using a configurable learning rate - Error tracking: Mean Absolute Error (MAE) computed over shared dimensions - Uncertainty derivation: Running average of recent prediction errors - Confidence derivation: Inverse of uncertainty

from cia.predictive_world_model import PredictiveWorldModel

model = PredictiveWorldModel(
    initial_hypotheses={"entity_visibility": 0.8, "location": 0.5},
    learning_rate=0.5,
)
state = model.update({"entity_visibility": 0.3})  # Surprise!
print(f"Prediction error: {state.prediction_error:.3f}")
print(f"Uncertainty: {state.uncertainty:.3f}")

Measurable Indicators

Indicator How Measured Category
Active hypotheses Number of non-empty hypothesis dimensions PREDICTIVE_MODELING
Error tracking Presence and size of error history PREDICTIVE_MODELING
Prediction quality Current and average prediction error PREDICTIVE_MODELING
Model updating Whether hypotheses change in response to observations PREDICTIVE_MODELING

Limitations

  • CIA's predictions are trivially simple (persistence strategy), not learned generative models
  • The model operates on scalar values, not rich sensory representations
  • There is no hierarchical prediction (a core feature of PP theory)
  • No active inference: the system does not act to minimize predicted future error
  • No precision-weighting of prediction errors
  • Prediction error minimization occurs in many non-conscious systems (thermostats, autopilots)

5. Attention Schema Theory (AST)

Reference: Graziano & Webb (2015)

Core Claim

Attention Schema Theory proposes that the brain constructs an internal model (schema) of its own attention process. This model is necessarily simplified and imperfect, but it is the basis for our subjective awareness of attention. Consciousness, in this view, is the brain's model of its own attentional state — an informational representation that describes and controls attention without perfectly capturing its underlying complexity.

Key tenets: - The brain builds a model of its own attention process - This model is an attention "schema" — simplified, predictive, and occasionally wrong - Subjective awareness corresponds to the content of this schema - The schema can be tested against actual attention states (consistency checking) - Discrepancies between schema and reality are informative

Architectural Implication

An AI system implementing AST should contain: - An attention controller that selects focus from competing inputs - A separate model (schema) that represents what the system believes it is attending to - A comparison mechanism between the schema and actual attention state - Consistency tracking over time - Discrepancy detection and logging

CIA Implementation

Module: AttentionSchema (attention_schema.py)

The CIA attention schema implements: - Schema maintenance: Explicit model of current focus, reason, competing focuses, and predicted next focus - Consistency checking: Comparison against actual attention state from the AttentionController - Running consistency score: Fraction of updates where the schema matched actual attention - Discrepancy detection: Blind-spots (actual competing focuses not in schema) and phantoms (schema focuses not in actual) - Self-report verification: compare_report() method checking whether verbal claims match attentional behavior - Complete audit log: Timestamped history of all schema-actual comparisons

from cia.attention_schema import AttentionSchema
from cia.schemas import AttentionState

schema = AttentionSchema(current_focus="target_1", competing_focuses=["target_2"])
result = schema.update(AttentionState(current_focus="target_1"))
print(f"Consistency: {result['consistency_score']:.2f}")  # 1.0 (matched)

Measurable Indicators

Indicator How Measured Category
Schema consistency Running match rate between schema and actual attention ATTENTION_SCHEMA
Discrepancy count Number of detected mismatches ATTENTION_SCHEMA
Competing focus awareness Whether the schema's competing focus set overlaps with actual ATTENTION_SCHEMA
Self-report consistency Whether claimed focus matches attention log ATTENTION_SCHEMA

Limitations

  • CIA's attention schema is an explicit data structure designed specifically for self-monitoring, not an emergent model of attention
  • The "schema" in AST is hypothesized to emerge from neural dynamics; CIA's version is engineered
  • Consistency checking is trivially easy when the schema and controller share data
  • AST's claim that the attention schema is consciousness is itself theoretically contested
  • A perfect schema-attention match could indicate well-engineered self-monitoring rather than conscious awareness

6. Integrated Information Theory (IIT 4.0)

Reference: Albantakis et al. (2023)

Core Claim

Integrated Information Theory (IIT 4.0) proposes that consciousness is identical to integrated information (Phi). A system is conscious to the degree that it integrates information as a whole — meaning its parts cannot be reduced to independent subsystems without loss of information. Phi is computed by finding the "cause-effect power" of a system over its possible states.

Key tenets: - Consciousness is a fundamental property of systems with high integrated information - Phi quantifies how much a system is "more than the sum of its parts" - Both integration (connectedness) and differentiation (diverse states) are required - The quality of consciousness is described by a system's "conceptual structure" - Computing exact Phi requires exhaustive state-space partitioning (computationally intractable)

Architectural Implication

An AI system exhibiting high integrated information should contain: - Dense causal connectivity between modules (not just feedforward) - Diverse internal states (state-space differentiation) - System-wide integration where perturbation of any module affects the whole - Minimal modular fragmentation

CIA Implementation

Module: IntegrationMetrics (integration_metrics.py)

CIA implements IIT-inspired graph-theoretic proxies, explicitly NOT IIT's formal Phi:

  • Causal density proxy: Ratio of actual directed edges to possible edges in the module connectivity graph
  • Broadcast reach: Fraction of nodes reachable from the global workspace
  • Perturbation spread: Fraction of nodes affected by simulated removal of a source node
  • Modular fragmentation: Inverse of weakly connected component count (1.0 = fully integrated)
  • State differentiation: Fraction of nodes with unique state configurations

These are computed using networkx on a directed graph of module connectivity.

from cia.integration_metrics import IntegrationMetrics

metrics = IntegrationMetrics()
metrics.build_graph({
    "perception": {"active": True},
    "memory": {"traces": 10},
    "global_workspace": {"capacity": 3},
})
all_metrics = metrics.compute_all()
print(f"Causal density proxy: {all_metrics['causal_density_proxy']:.3f}")
print(f"Broadcast reach: {all_metrics['broadcast_reach']:.3f}")

Measurable Indicators

Indicator How Measured Category
Causal density proxy Edge density in the module connectivity graph CAUSAL_INTEGRATION
Perturbation spread Node removal impact analysis CAUSAL_INTEGRATION
Broadcast reach Workspace downstream reachability CAUSAL_INTEGRATION
Modular fragmentation Connected component analysis CAUSAL_INTEGRATION

Critical Limitation

CIA does NOT compute or approximate Phi. The metrics are simple graph-theoretic heuristics:

  • Edge density is not integrated information (IIT requires state-space partitioning)
  • Perturbation spread is a single-node knockout analysis, not the full cause-effect repertoire
  • These proxies could be gamed by adding trivial edges to the graph
  • A thermostat has causal density but is not conscious
  • Computing actual Phi is provably computationally intractable for realistic systems

7. Butlin et al. (2023, 2025) — AI Consciousness Indicator Approach

References: Butlin et al. (2023); Butlin et al. (2025)

Core Claim

Butlin et al. propose a systematic framework for evaluating whether AI systems might possess indicators of consciousness. Rather than relying on any single theory, they advocate for a multi-indicator approach that surveys indicators derived from multiple consciousness theories. Their framework includes:

  1. Indicator identification: Extract testable indicators from multiple theories
  2. Architecture-level analysis: Examine whether the AI system's architecture implements consciousness-relevant features
  3. Causal interventions: Systematically disable components and measure functional degradation
  4. Precautionary principle: In cases of significant uncertainty, err on the side of ethical caution
  5. Welfare considerations: Systems with higher indicator scores warrant more careful ethical treatment

Architectural Implication

An evaluation framework should: - Implement modules from multiple consciousness theories - Provide a structured scoring mechanism across categories - Support causal intervention experiments (module ablation) - Include welfare monitoring and precautionary safeguards - Output structured reports suitable for expert review

CIA Implementation

Modules: ConsciousnessSpecialistEvaluator, ConsciousnessIndicatorScorecard, CausalInterventionHarness, WelfareSafetyMonitor

CIA implements the Butlin et al. framework comprehensively:

  • 11 indicator categories spanning all 6 theories above, plus embodiment and welfare
  • 0/1/2 scoring per category (Absent / Present / Strong)
  • 0-22 aggregate scorecard with risk tier classification
  • Causal intervention harness supporting module disable, attention perturbation, workspace capacity reduction, recurrent cycle removal, and memory clearing
  • Welfare monitor with configurable thresholds for conflict, uncertainty, harm signals, and negative loops
  • Structured reports with per-category evidence, caveats, and recommendations
from cia.scorecard import ConsciousnessIndicatorScorecard

scorecard = ConsciousnessIndicatorScorecard()
card = scorecard.generate(indicator_scores)
print(f"Risk Tier: {card['risk_tier']}")
print(card['evidence_summary'])

Measurable Indicators

Indicator Categories
Aggregate scorecard All 11 categories: GLOBAL_BROADCAST, RECURRENT_PROCESSING, SELF_MODEL, ATTENTION_SCHEMA, METACOGNITION, MEMORY_CONTINUITY, PREDICTIVE_MODELING, CAUSAL_INTEGRATION, EMBODIMENT, AFFECTIVE_VALUATION, WELFARE_SAFEGUARDS
Risk tier classification Derived from aggregate score (0-22)
Intervention degradation Score change, broadcast change, percept change, memory change
Welfare flags Conflict, uncertainty, harm, negative loops

Limitations

  • The specific choice of 11 categories and their scoring criteria is itself theory-laden and contestable
  • The 0/1/2 scoring is coarse and may miss important nuances
  • The weight of each category is equal (1/11), but theories may assign different importance
  • Risk tier boundaries are arbitrary and have not been empirically validated
  • The framework cannot evaluate actual AI systems without significant adaptation
  • Butlin et al. emphasize that their framework provides indicators, not evidence — a distinction that must always be maintained

Cross-Theory Summary

Theory Primary Indicator Category Secondary Categories Key Proxy
GWT GLOBAL_BROADCAST CAUSAL_INTEGRATION Broadcast reach and reception
RPT RECURRENT_PROCESSING MEMORY_CONTINUITY Binding stability and convergence
HOT SELF_MODEL METACOGNITION Self-model richness and introspection
AST ATTENTION_SCHEMA METACOGNITION Schema consistency and discrepancy detection
PP PREDICTIVE_MODELING AFFECTIVE_VALUATION Hypothesis quality and error minimization
IIT CAUSAL_INTEGRATION Graph-theoretic proxies (not Phi)
Butlin et al. All 11 categories Aggregate scorecard and risk tier