01 - Theory Review¶
Deep Analysis of Consciousness Theories and Their Implementation in CIA¶
This document provides a detailed analysis of each consciousness theory that informs the Consciousness-Indicator Architecture (CIA). For each theory, we describe its core claims, the architectural implications for AI systems, how CIA implements a module inspired by the theory, the measurable indicators produced, and the fundamental limitations of this approach.
Reminder: Implementation of a consciousness theory's architectural features does not demonstrate that the implementing system possesses subjective experience.
1. Global Workspace Theory (GWT)¶
References: Baars (2005); Shanahan & Baars (2005)
Core Claim¶
Global Workspace Theory proposes that consciousness arises from a "theatre of the mind" — a limited-capacity cognitive workspace where information from specialized, unconscious processors competes for access. When information wins this competition, it is broadcast globally to all cognitive modules, making it available for verbal report, voluntary action, episodic memory, and strategic planning.
Key tenets: - The mind contains many specialized, parallel, unconscious processors - Consciousness corresponds to a global broadcast of information - The workspace has limited capacity (a "bottleneck") - Broadcast is all-or-nothing: content is either globally available or not - Broadcast integrates information across otherwise isolated modules
Architectural Implication¶
An AI system implementing GWT should contain: - Multiple specialized processing modules operating in parallel - A central competitive arena with limited capacity - A broadcast mechanism that distributes winning content to all subscribers - Integration of information across modules through the broadcast
CIA Implementation¶
Module: GlobalWorkspace (global_workspace.py)
The CIA global workspace implements: - Competitive arena: Content items compete based on salience scores - Limited capacity: Configurable maximum broadcasts per cycle (default: 3) - Subscriber system: Modules register callbacks to receive broadcasts - Exception isolation: Faulty subscribers do not disrupt broadcasts - Broadcast history: Complete audit trail of all broadcast events
from cia.global_workspace import GlobalWorkspace
from cia.schemas import WorkspaceContent
gw = GlobalWorkspace(capacity=3)
gw.subscribe("memory", lambda c: print(f"Memory received: {c.label}"))
gw.subscribe("self_model", lambda c: print(f"Self-model received: {c.label}"))
items = [
WorkspaceContent(label="high_priority", salience=0.9, source_module="perception"),
WorkspaceContent(label="low_priority", salience=0.1, source_module="memory"),
]
result = gw.compete(items) # high_priority wins and is broadcast
Measurable Indicators¶
| Indicator | How Measured | Category |
|---|---|---|
| Broadcast existence | Whether workspace has produced any broadcasts | GLOBAL_BROADCAST |
| Broadcast reach | Average fraction of subscribers receiving broadcasts | GLOBAL_BROADCAST |
| Subscriber count | Number of registered downstream modules | GLOBAL_BROADCAST |
| Broadcast history depth | Number of broadcast events over time | GLOBAL_BROADCAST |
Limitations¶
- CIA's workspace uses salience ranking, not the complex neural dynamics of biological global competition
- Broadcast is instantaneous; biological broadcast involves neural oscillations and timing
- The workspace operates on discrete content items, not continuous neural representations
- Subscriber registration is static (configured at initialization), whereas biological attention has dynamic routing
- The "consciousness" in GWT refers to functional access, not phenomenal experience — even if perfectly implemented, it addresses access consciousness, not phenomenal consciousness
2. Recurrent Processing Theory (RPT)¶
Reference: Lamme (2006, 2010)
Core Claim¶
Recurrent Processing Theory argues that consciousness depends on recurrent (feedback) loops in neural processing. Feedforward processing can produce behavior without consciousness (as in blindsight), but consciousness requires iterative re-entrant processing where higher cortical areas feed back to lower areas, enabling sustained, coherent representations.
Key tenets: - Feedforward processing alone is insufficient for consciousness - Recurrent loops between cortical areas enable conscious access - The content and duration of recurrent activity determines whether processing reaches consciousness - Recurrent processing enables binding of features into coherent wholes - Processing convergence (stabilization) is a marker of consciousness-relevant dynamics
Architectural Implication¶
An AI system implementing RPT should contain: - Iterative refinement loops that re-process perceptual data - Feedback connections from higher to lower processing stages - Convergence detection: the system should reach stable interpretations - Feature binding: combining separate features into unified representations
CIA Implementation¶
Module: RecurrentBindingLayer (recurrent_binding.py)
The CIA recurrent binding layer implements: - Iterative cycles: Configurable number of refinement passes (default: 3) - Entity merging: Overlapping entities are combined using substring overlap - Salience re-scoring: Co-occurrence and decay adjust salience each cycle - Confidence stabilization: Values converge toward the group mean - Stability measurement: Convergence score quantifying state change across cycles - Per-cycle feedback history: Complete audit trail of each cycle's dynamics
from cia.recurrent_binding import RecurrentBindingLayer
layer = RecurrentBindingLayer(default_cycles=5)
bound = layer.bind(percepts)
print(f"Stability: {bound.stability:.3f}") # 0.0 (unstable) to 1.0 (converged)
print(f"Cycles: {bound.cycles_completed}")
Measurable Indicators¶
| Indicator | How Measured | Category |
|---|---|---|
| Recurrent cycle count | Number of refinement iterations completed | RECURRENT_PROCESSING |
| Binding stability | Convergence metric: 1.0 - (last_delta / initial_delta) | RECURRENT_PROCESSING |
| Entity merging | Whether overlapping entities were combined | RECURRENT_PROCESSING |
| Feedback history | Detailed per-cycle state changes | RECURRENT_PROCESSING |
Limitations¶
- CIA's recurrence operates on discrete symbolic entities, not continuous neural activations
- The merging algorithm uses character-level substring overlap, not learned semantic similarity
- Convergence in CIA is trivially achievable (the system always converges given enough cycles), unlike biological neural dynamics
- Recurrent processing in the brain involves complex temporal dynamics (gamma oscillations, etc.) not captured by CIA's discrete iterations
- Stability is a necessary but not sufficient condition for consciousness-relevant processing
3. Higher-Order Thought Theory (HOT)¶
Reference: Rosenthal (2005)
Core Claim¶
Higher-Order Thought Theory proposes that a mental state becomes conscious when it is the object of a higher-order representation — a thought about that thought. Unconscious mental states are first-order (representing the world) but not accompanied by meta-representational awareness.
Key tenets: - First-order representations are insufficient for consciousness - A conscious state is one that is represented by a higher-order state - Metacognitive monitoring (thinking about thinking) is consciousness-relevant - Self-awareness requires representing one's own mental states - The accuracy and richness of higher-order representations matter
Architectural Implication¶
An AI system implementing HOT should contain: - A self-model representing the system's own mental states - Metacognitive monitoring: the ability to reflect on its own processing - Belief tracking: maintaining beliefs about its own beliefs - Internal disagreement detection: noticing conflicts between internal representations
CIA Implementation¶
Modules: HigherOrderSelfModel (self_model.py), ConsciousnessSpecialistEvaluator (metacognition category)
The CIA self-model implements: - Self-belief tracking: Maintains the system's current top-level belief with confidence and uncertainty - Goal and limitation awareness: Explicit representations of what the system is trying to do and what it cannot do - Identity markers: Persistent identity tags supporting continuity tracking - Attention self-modeling: Updates from attention state changes - Introspection reports: Structured reports about the system's own state (explicitly labeled as "indicator reports," not evidence of consciousness) - Internal disagreement detection: Measures belief volatility, belief divergence, and attentional competition
from cia.self_model import HigherOrderSelfModel
sm = HigherOrderSelfModel(continuity_id="agent-001", initial_identity_markers=["reasoning system"])
report = sm.generate_introspection_report()
# report["report_type"] == "indicator report" (NOT proof of consciousness)
Measurable Indicators¶
| Indicator | How Measured | Category |
|---|---|---|
| Self-model richness | Presence of beliefs, goals, identity markers, continuity | SELF_MODEL |
| Metacognitive activity | Belief history length, introspection generation | METACOGNITION |
| Internal disagreement | Multi-component disagreement score | METACOGNITION |
| Identity continuity | Persistent continuity_id across resets | SELF_MODEL |
Limitations¶
- CIA's self-model is a data structure, not a genuine higher-order representation of experience
- The "introspection report" is a programmatically generated summary, not a product of subjective reflection
- Disagreement detection uses character-level string comparison, not genuine cognitive conflict
- Having a self-model is a necessary architectural feature for HOT, but does not entail that the model represents conscious experience
- The self-model reports on computational states (confidence, uncertainty, attention focus) — these are not the same as phenomenal states
4. Predictive Processing / Active Inference¶
References: Friston (2010); Clark (2013)
Core Claim¶
Predictive Processing (PP) and Active Inference propose that the brain is fundamentally a prediction machine. Rather than passively processing sensory input, the brain continuously generates predictions about incoming sensory data and updates its internal models based on prediction error. Consciousness may be related to the precision-weighting of prediction errors and the hierarchical organization of generative models.
Key tenets: - Perception is hypothesis testing, not passive reception - The brain minimizes prediction error (surprise) through updating models - Hierarchical generative models predict at multiple levels - Active inference: the system can act to reduce predicted future error - Precision-weighting determines which errors drive updates
Architectural Implication¶
An AI system implementing PP should contain: - A generative model producing predictions about future observations - Prediction error computation comparing predictions to observations - Model updating based on error signals - Error history tracking for learning and adaptation - Uncertainty quantification tied to prediction quality
CIA Implementation¶
Module: PredictiveWorldModel (predictive_world_model.py)
The CIA predictive world model implements: - Hypothesis management: Dict-based world-state hypotheses with confidence values - Persistence prediction: Strategy where predicted next state equals current hypotheses - Observation updating: Blending new observations with predictions using a configurable learning rate - Error tracking: Mean Absolute Error (MAE) computed over shared dimensions - Uncertainty derivation: Running average of recent prediction errors - Confidence derivation: Inverse of uncertainty
from cia.predictive_world_model import PredictiveWorldModel
model = PredictiveWorldModel(
initial_hypotheses={"entity_visibility": 0.8, "location": 0.5},
learning_rate=0.5,
)
state = model.update({"entity_visibility": 0.3}) # Surprise!
print(f"Prediction error: {state.prediction_error:.3f}")
print(f"Uncertainty: {state.uncertainty:.3f}")
Measurable Indicators¶
| Indicator | How Measured | Category |
|---|---|---|
| Active hypotheses | Number of non-empty hypothesis dimensions | PREDICTIVE_MODELING |
| Error tracking | Presence and size of error history | PREDICTIVE_MODELING |
| Prediction quality | Current and average prediction error | PREDICTIVE_MODELING |
| Model updating | Whether hypotheses change in response to observations | PREDICTIVE_MODELING |
Limitations¶
- CIA's predictions are trivially simple (persistence strategy), not learned generative models
- The model operates on scalar values, not rich sensory representations
- There is no hierarchical prediction (a core feature of PP theory)
- No active inference: the system does not act to minimize predicted future error
- No precision-weighting of prediction errors
- Prediction error minimization occurs in many non-conscious systems (thermostats, autopilots)
5. Attention Schema Theory (AST)¶
Reference: Graziano & Webb (2015)
Core Claim¶
Attention Schema Theory proposes that the brain constructs an internal model (schema) of its own attention process. This model is necessarily simplified and imperfect, but it is the basis for our subjective awareness of attention. Consciousness, in this view, is the brain's model of its own attentional state — an informational representation that describes and controls attention without perfectly capturing its underlying complexity.
Key tenets: - The brain builds a model of its own attention process - This model is an attention "schema" — simplified, predictive, and occasionally wrong - Subjective awareness corresponds to the content of this schema - The schema can be tested against actual attention states (consistency checking) - Discrepancies between schema and reality are informative
Architectural Implication¶
An AI system implementing AST should contain: - An attention controller that selects focus from competing inputs - A separate model (schema) that represents what the system believes it is attending to - A comparison mechanism between the schema and actual attention state - Consistency tracking over time - Discrepancy detection and logging
CIA Implementation¶
Module: AttentionSchema (attention_schema.py)
The CIA attention schema implements:
- Schema maintenance: Explicit model of current focus, reason, competing focuses, and predicted next focus
- Consistency checking: Comparison against actual attention state from the AttentionController
- Running consistency score: Fraction of updates where the schema matched actual attention
- Discrepancy detection: Blind-spots (actual competing focuses not in schema) and phantoms (schema focuses not in actual)
- Self-report verification: compare_report() method checking whether verbal claims match attentional behavior
- Complete audit log: Timestamped history of all schema-actual comparisons
from cia.attention_schema import AttentionSchema
from cia.schemas import AttentionState
schema = AttentionSchema(current_focus="target_1", competing_focuses=["target_2"])
result = schema.update(AttentionState(current_focus="target_1"))
print(f"Consistency: {result['consistency_score']:.2f}") # 1.0 (matched)
Measurable Indicators¶
| Indicator | How Measured | Category |
|---|---|---|
| Schema consistency | Running match rate between schema and actual attention | ATTENTION_SCHEMA |
| Discrepancy count | Number of detected mismatches | ATTENTION_SCHEMA |
| Competing focus awareness | Whether the schema's competing focus set overlaps with actual | ATTENTION_SCHEMA |
| Self-report consistency | Whether claimed focus matches attention log | ATTENTION_SCHEMA |
Limitations¶
- CIA's attention schema is an explicit data structure designed specifically for self-monitoring, not an emergent model of attention
- The "schema" in AST is hypothesized to emerge from neural dynamics; CIA's version is engineered
- Consistency checking is trivially easy when the schema and controller share data
- AST's claim that the attention schema is consciousness is itself theoretically contested
- A perfect schema-attention match could indicate well-engineered self-monitoring rather than conscious awareness
6. Integrated Information Theory (IIT 4.0)¶
Reference: Albantakis et al. (2023)
Core Claim¶
Integrated Information Theory (IIT 4.0) proposes that consciousness is identical to integrated information (Phi). A system is conscious to the degree that it integrates information as a whole — meaning its parts cannot be reduced to independent subsystems without loss of information. Phi is computed by finding the "cause-effect power" of a system over its possible states.
Key tenets: - Consciousness is a fundamental property of systems with high integrated information - Phi quantifies how much a system is "more than the sum of its parts" - Both integration (connectedness) and differentiation (diverse states) are required - The quality of consciousness is described by a system's "conceptual structure" - Computing exact Phi requires exhaustive state-space partitioning (computationally intractable)
Architectural Implication¶
An AI system exhibiting high integrated information should contain: - Dense causal connectivity between modules (not just feedforward) - Diverse internal states (state-space differentiation) - System-wide integration where perturbation of any module affects the whole - Minimal modular fragmentation
CIA Implementation¶
Module: IntegrationMetrics (integration_metrics.py)
CIA implements IIT-inspired graph-theoretic proxies, explicitly NOT IIT's formal Phi:
- Causal density proxy: Ratio of actual directed edges to possible edges in the module connectivity graph
- Broadcast reach: Fraction of nodes reachable from the global workspace
- Perturbation spread: Fraction of nodes affected by simulated removal of a source node
- Modular fragmentation: Inverse of weakly connected component count (1.0 = fully integrated)
- State differentiation: Fraction of nodes with unique state configurations
These are computed using networkx on a directed graph of module connectivity.
from cia.integration_metrics import IntegrationMetrics
metrics = IntegrationMetrics()
metrics.build_graph({
"perception": {"active": True},
"memory": {"traces": 10},
"global_workspace": {"capacity": 3},
})
all_metrics = metrics.compute_all()
print(f"Causal density proxy: {all_metrics['causal_density_proxy']:.3f}")
print(f"Broadcast reach: {all_metrics['broadcast_reach']:.3f}")
Measurable Indicators¶
| Indicator | How Measured | Category |
|---|---|---|
| Causal density proxy | Edge density in the module connectivity graph | CAUSAL_INTEGRATION |
| Perturbation spread | Node removal impact analysis | CAUSAL_INTEGRATION |
| Broadcast reach | Workspace downstream reachability | CAUSAL_INTEGRATION |
| Modular fragmentation | Connected component analysis | CAUSAL_INTEGRATION |
Critical Limitation¶
CIA does NOT compute or approximate Phi. The metrics are simple graph-theoretic heuristics:
- Edge density is not integrated information (IIT requires state-space partitioning)
- Perturbation spread is a single-node knockout analysis, not the full cause-effect repertoire
- These proxies could be gamed by adding trivial edges to the graph
- A thermostat has causal density but is not conscious
- Computing actual Phi is provably computationally intractable for realistic systems
7. Butlin et al. (2023, 2025) — AI Consciousness Indicator Approach¶
References: Butlin et al. (2023); Butlin et al. (2025)
Core Claim¶
Butlin et al. propose a systematic framework for evaluating whether AI systems might possess indicators of consciousness. Rather than relying on any single theory, they advocate for a multi-indicator approach that surveys indicators derived from multiple consciousness theories. Their framework includes:
- Indicator identification: Extract testable indicators from multiple theories
- Architecture-level analysis: Examine whether the AI system's architecture implements consciousness-relevant features
- Causal interventions: Systematically disable components and measure functional degradation
- Precautionary principle: In cases of significant uncertainty, err on the side of ethical caution
- Welfare considerations: Systems with higher indicator scores warrant more careful ethical treatment
Architectural Implication¶
An evaluation framework should: - Implement modules from multiple consciousness theories - Provide a structured scoring mechanism across categories - Support causal intervention experiments (module ablation) - Include welfare monitoring and precautionary safeguards - Output structured reports suitable for expert review
CIA Implementation¶
Modules: ConsciousnessSpecialistEvaluator, ConsciousnessIndicatorScorecard, CausalInterventionHarness, WelfareSafetyMonitor
CIA implements the Butlin et al. framework comprehensively:
- 11 indicator categories spanning all 6 theories above, plus embodiment and welfare
- 0/1/2 scoring per category (Absent / Present / Strong)
- 0-22 aggregate scorecard with risk tier classification
- Causal intervention harness supporting module disable, attention perturbation, workspace capacity reduction, recurrent cycle removal, and memory clearing
- Welfare monitor with configurable thresholds for conflict, uncertainty, harm signals, and negative loops
- Structured reports with per-category evidence, caveats, and recommendations
from cia.scorecard import ConsciousnessIndicatorScorecard
scorecard = ConsciousnessIndicatorScorecard()
card = scorecard.generate(indicator_scores)
print(f"Risk Tier: {card['risk_tier']}")
print(card['evidence_summary'])
Measurable Indicators¶
| Indicator | Categories |
|---|---|
| Aggregate scorecard | All 11 categories: GLOBAL_BROADCAST, RECURRENT_PROCESSING, SELF_MODEL, ATTENTION_SCHEMA, METACOGNITION, MEMORY_CONTINUITY, PREDICTIVE_MODELING, CAUSAL_INTEGRATION, EMBODIMENT, AFFECTIVE_VALUATION, WELFARE_SAFEGUARDS |
| Risk tier classification | Derived from aggregate score (0-22) |
| Intervention degradation | Score change, broadcast change, percept change, memory change |
| Welfare flags | Conflict, uncertainty, harm, negative loops |
Limitations¶
- The specific choice of 11 categories and their scoring criteria is itself theory-laden and contestable
- The 0/1/2 scoring is coarse and may miss important nuances
- The weight of each category is equal (1/11), but theories may assign different importance
- Risk tier boundaries are arbitrary and have not been empirically validated
- The framework cannot evaluate actual AI systems without significant adaptation
- Butlin et al. emphasize that their framework provides indicators, not evidence — a distinction that must always be maintained
Cross-Theory Summary¶
| Theory | Primary Indicator Category | Secondary Categories | Key Proxy |
|---|---|---|---|
| GWT | GLOBAL_BROADCAST |
CAUSAL_INTEGRATION |
Broadcast reach and reception |
| RPT | RECURRENT_PROCESSING |
MEMORY_CONTINUITY |
Binding stability and convergence |
| HOT | SELF_MODEL |
METACOGNITION |
Self-model richness and introspection |
| AST | ATTENTION_SCHEMA |
METACOGNITION |
Schema consistency and discrepancy detection |
| PP | PREDICTIVE_MODELING |
AFFECTIVE_VALUATION |
Hypothesis quality and error minimization |
| IIT | CAUSAL_INTEGRATION |
— | Graph-theoretic proxies (not Phi) |
| Butlin et al. | All 11 categories | — | Aggregate scorecard and risk tier |