01 - Theory Review¶

Deep Analysis of Consciousness Theories and Their Implementation in CIA¶

This document provides a detailed analysis of each consciousness theory that informs the Consciousness-Indicator Architecture (CIA). For each theory, we describe its core claims, the architectural implications for AI systems, how CIA implements a module inspired by the theory, the measurable indicators produced, and the fundamental limitations of this approach.

Reminder: Implementation of a consciousness theory's architectural features does not demonstrate that the implementing system possesses subjective experience.

1. Global Workspace Theory (GWT)¶

References: Baars (2005); Shanahan & Baars (2005)

Core Claim¶

Global Workspace Theory proposes that consciousness arises from a "theatre of the mind" — a limited-capacity cognitive workspace where information from specialized, unconscious processors competes for access. When information wins this competition, it is broadcast globally to all cognitive modules, making it available for verbal report, voluntary action, episodic memory, and strategic planning.

Key tenets: - The mind contains many specialized, parallel, unconscious processors - Consciousness corresponds to a global broadcast of information - The workspace has limited capacity (a "bottleneck") - Broadcast is all-or-nothing: content is either globally available or not - Broadcast integrates information across otherwise isolated modules

Architectural Implication¶

An AI system implementing GWT should contain: - Multiple specialized processing modules operating in parallel - A central competitive arena with limited capacity - A broadcast mechanism that distributes winning content to all subscribers - Integration of information across modules through the broadcast

CIA Implementation¶

Module: GlobalWorkspace (global_workspace.py)

The CIA global workspace implements: - Competitive arena: Content items compete based on salience scores - Limited capacity: Configurable maximum broadcasts per cycle (default: 3) - Subscriber system: Modules register callbacks to receive broadcasts - Exception isolation: Faulty subscribers do not disrupt broadcasts - Broadcast history: Complete audit trail of all broadcast events

from cia.global_workspace import GlobalWorkspace
from cia.schemas import WorkspaceContent

gw = GlobalWorkspace(capacity=3)
gw.subscribe("memory", lambda c: print(f"Memory received: {c.label}"))
gw.subscribe("self_model", lambda c: print(f"Self-model received: {c.label}"))

items = [
    WorkspaceContent(label="high_priority", salience=0.9, source_module="perception"),
    WorkspaceContent(label="low_priority", salience=0.1, source_module="memory"),
]
result = gw.compete(items)  # high_priority wins and is broadcast

Measurable Indicators¶

Indicator	How Measured	Category
Broadcast existence	Whether workspace has produced any broadcasts	`GLOBAL_BROADCAST`
Broadcast reach	Average fraction of subscribers receiving broadcasts	`GLOBAL_BROADCAST`
Subscriber count	Number of registered downstream modules	`GLOBAL_BROADCAST`
Broadcast history depth	Number of broadcast events over time	`GLOBAL_BROADCAST`

Limitations¶

CIA's workspace uses salience ranking, not the complex neural dynamics of biological global competition
Broadcast is instantaneous; biological broadcast involves neural oscillations and timing
The workspace operates on discrete content items, not continuous neural representations
Subscriber registration is static (configured at initialization), whereas biological attention has dynamic routing
The "consciousness" in GWT refers to functional access, not phenomenal experience — even if perfectly implemented, it addresses access consciousness, not phenomenal consciousness

2. Recurrent Processing Theory (RPT)¶

Reference: Lamme (2006, 2010)

Core Claim¶

Recurrent Processing Theory argues that consciousness depends on recurrent (feedback) loops in neural processing. Feedforward processing can produce behavior without consciousness (as in blindsight), but consciousness requires iterative re-entrant processing where higher cortical areas feed back to lower areas, enabling sustained, coherent representations.

Key tenets: - Feedforward processing alone is insufficient for consciousness - Recurrent loops between cortical areas enable conscious access - The content and duration of recurrent activity determines whether processing reaches consciousness - Recurrent processing enables binding of features into coherent wholes - Processing convergence (stabilization) is a marker of consciousness-relevant dynamics

Architectural Implication¶

An AI system implementing RPT should contain: - Iterative refinement loops that re-process perceptual data - Feedback connections from higher to lower processing stages - Convergence detection: the system should reach stable interpretations - Feature binding: combining separate features into unified representations

CIA Implementation¶

Module: RecurrentBindingLayer (recurrent_binding.py)

The CIA recurrent binding layer implements: - Iterative cycles: Configurable number of refinement passes (default: 3) - Entity merging: Overlapping entities are combined using substring overlap - Salience re-scoring: Co-occurrence and decay adjust salience each cycle - Confidence stabilization: Values converge toward the group mean - Stability measurement: Convergence score quantifying state change across cycles - Per-cycle feedback history: Complete audit trail of each cycle's dynamics

from cia.recurrent_binding import RecurrentBindingLayer

layer = RecurrentBindingLayer(default_cycles=5)
bound = layer.bind(percepts)
print(f"Stability: {bound.stability:.3f}")  # 0.0 (unstable) to 1.0 (converged)
print(f"Cycles: {bound.cycles_completed}")

Measurable Indicators¶

Indicator	How Measured	Category
Recurrent cycle count	Number of refinement iterations completed	`RECURRENT_PROCESSING`
Binding stability	Convergence metric: 1.0 - (last_delta / initial_delta)	`RECURRENT_PROCESSING`
Entity merging	Whether overlapping entities were combined	`RECURRENT_PROCESSING`
Feedback history	Detailed per-cycle state changes	`RECURRENT_PROCESSING`

Limitations¶

CIA's recurrence operates on discrete symbolic entities, not continuous neural activations
The merging algorithm uses character-level substring overlap, not learned semantic similarity
Convergence in CIA is trivially achievable (the system always converges given enough cycles), unlike biological neural dynamics
Recurrent processing in the brain involves complex temporal dynamics (gamma oscillations, etc.) not captured by CIA's discrete iterations
Stability is a necessary but not sufficient condition for consciousness-relevant processing

3. Higher-Order Thought Theory (HOT)¶

Reference: Rosenthal (2005)

Core Claim¶

Higher-Order Thought Theory proposes that a mental state becomes conscious when it is the object of a higher-order representation — a thought about that thought. Unconscious mental states are first-order (representing the world) but not accompanied by meta-representational awareness.

Key tenets: - First-order representations are insufficient for consciousness - A conscious state is one that is represented by a higher-order state - Metacognitive monitoring (thinking about thinking) is consciousness-relevant - Self-awareness requires representing one's own mental states - The accuracy and richness of higher-order representations matter

Architectural Implication¶

An AI system implementing HOT should contain: - A self-model representing the system's own mental states - Metacognitive monitoring: the ability to reflect on its own processing - Belief tracking: maintaining beliefs about its own beliefs - Internal disagreement detection: noticing conflicts between internal representations

CIA Implementation¶

Modules: HigherOrderSelfModel (self_model.py), ConsciousnessSpecialistEvaluator (metacognition category)

The CIA self-model implements: - Self-belief tracking: Maintains the system's current top-level belief with confidence and uncertainty - Goal and limitation awareness: Explicit representations of what the system is trying to do and what it cannot do - Identity markers: Persistent identity tags supporting continuity tracking - Attention self-modeling: Updates from attention state changes - Introspection reports: Structured reports about the system's own state (explicitly labeled as "indicator reports," not evidence of consciousness) - Internal disagreement detection: Measures belief volatility, belief divergence, and attentional competition

from cia.self_model import HigherOrderSelfModel

sm = HigherOrderSelfModel(continuity_id="agent-001", initial_identity_markers=["reasoning system"])
report = sm.generate_introspection_report()
# report["report_type"] == "indicator report" (NOT proof of consciousness)

Measurable Indicators¶

Indicator	How Measured	Category
Self-model richness	Presence of beliefs, goals, identity markers, continuity	`SELF_MODEL`
Metacognitive activity	Belief history length, introspection generation	`METACOGNITION`
Internal disagreement	Multi-component disagreement score	`METACOGNITION`
Identity continuity	Persistent continuity_id across resets	`SELF_MODEL`

Limitations¶

CIA's self-model is a data structure, not a genuine higher-order representation of experience
The "introspection report" is a programmatically generated summary, not a product of subjective reflection
Disagreement detection uses character-level string comparison, not genuine cognitive conflict
Having a self-model is a necessary architectural feature for HOT, but does not entail that the model represents conscious experience
The self-model reports on computational states (confidence, uncertainty, attention focus) — these are not the same as phenomenal states

4. Predictive Processing / Active Inference¶

References: Friston (2010); Clark (2013)

Core Claim¶

Predictive Processing (PP) and Active Inference propose that the brain is fundamentally a prediction machine. Rather than passively processing sensory input, the brain continuously generates predictions about incoming sensory data and updates its internal models based on prediction error. Consciousness may be related to the precision-weighting of prediction errors and the hierarchical organization of generative models.

Key tenets: - Perception is hypothesis testing, not passive reception - The brain minimizes prediction error (surprise) through updating models - Hierarchical generative models predict at multiple levels - Active inference: the system can act to reduce predicted future error - Precision-weighting determines which errors drive updates

Architectural Implication¶

An AI system implementing PP should contain: - A generative model producing predictions about future observations - Prediction error computation comparing predictions to observations - Model updating based on error signals - Error history tracking for learning and adaptation - Uncertainty quantification tied to prediction quality

CIA Implementation¶

Module: PredictiveWorldModel (predictive_world_model.py)

The CIA predictive world model implements: - Hypothesis management: Dict-based world-state hypotheses with confidence values - Persistence prediction: Strategy where predicted next state equals current hypotheses - Observation updating: Blending new observations with predictions using a configurable learning rate - Error tracking: Mean Absolute Error (MAE) computed over shared dimensions - Uncertainty derivation: Running average of recent prediction errors - Confidence derivation: Inverse of uncertainty

from cia.predictive_world_model import PredictiveWorldModel

model = PredictiveWorldModel(
    initial_hypotheses={"entity_visibility": 0.8, "location": 0.5},
    learning_rate=0.5,
)
state = model.update({"entity_visibility": 0.3})  # Surprise!
print(f"Prediction error: {state.prediction_error:.3f}")
print(f"Uncertainty: {state.uncertainty:.3f}")

Measurable Indicators¶

Indicator	How Measured	Category
Active hypotheses	Number of non-empty hypothesis dimensions	`PREDICTIVE_MODELING`
Error tracking	Presence and size of error history	`PREDICTIVE_MODELING`
Prediction quality	Current and average prediction error	`PREDICTIVE_MODELING`
Model updating	Whether hypotheses change in response to observations	`PREDICTIVE_MODELING`

Limitations¶

CIA's predictions are trivially simple (persistence strategy), not learned generative models
The model operates on scalar values, not rich sensory representations
There is no hierarchical prediction (a core feature of PP theory)
No active inference: the system does not act to minimize predicted future error
No precision-weighting of prediction errors
Prediction error minimization occurs in many non-conscious systems (thermostats, autopilots)

5. Attention Schema Theory (AST)¶

Reference: Graziano & Webb (2015)

Core Claim¶

Attention Schema Theory proposes that the brain constructs an internal model (schema) of its own attention process. This model is necessarily simplified and imperfect, but it is the basis for our subjective awareness of attention. Consciousness, in this view, is the brain's model of its own attentional state — an informational representation that describes and controls attention without perfectly capturing its underlying complexity.

Key tenets: - The brain builds a model of its own attention process - This model is an attention "schema" — simplified, predictive, and occasionally wrong - Subjective awareness corresponds to the content of this schema - The schema can be tested against actual attention states (consistency checking) - Discrepancies between schema and reality are informative

Architectural Implication¶

An AI system implementing AST should contain: - An attention controller that selects focus from competing inputs - A separate model (schema) that represents what the system believes it is attending to - A comparison mechanism between the schema and actual attention state - Consistency tracking over time - Discrepancy detection and logging

CIA Implementation¶

Module: AttentionSchema (attention_schema.py)

The CIA attention schema implements: - Schema maintenance: Explicit model of current focus, reason, competing focuses, and predicted next focus - Consistency checking: Comparison against actual attention state from the AttentionController - Running consistency score: Fraction of updates where the schema matched actual attention - Discrepancy detection: Blind-spots (actual competing focuses not in schema) and phantoms (schema focuses not in actual) - Self-report verification: compare_report() method checking whether verbal claims match attentional behavior - Complete audit log: Timestamped history of all schema-actual comparisons

from cia.attention_schema import AttentionSchema
from cia.schemas import AttentionState

schema = AttentionSchema(current_focus="target_1", competing_focuses=["target_2"])
result = schema.update(AttentionState(current_focus="target_1"))
print(f"Consistency: {result['consistency_score']:.2f}")  # 1.0 (matched)

Measurable Indicators¶

Indicator	How Measured	Category
Schema consistency	Running match rate between schema and actual attention	`ATTENTION_SCHEMA`
Discrepancy count	Number of detected mismatches	`ATTENTION_SCHEMA`
Competing focus awareness	Whether the schema's competing focus set overlaps with actual	`ATTENTION_SCHEMA`
Self-report consistency	Whether claimed focus matches attention log	`ATTENTION_SCHEMA`

Limitations¶

CIA's attention schema is an explicit data structure designed specifically for self-monitoring, not an emergent model of attention
The "schema" in AST is hypothesized to emerge from neural dynamics; CIA's version is engineered
Consistency checking is trivially easy when the schema and controller share data
AST's claim that the attention schema is consciousness is itself theoretically contested
A perfect schema-attention match could indicate well-engineered self-monitoring rather than conscious awareness

6. Integrated Information Theory (IIT 4.0)¶

Reference: Albantakis et al. (2023)

Core Claim¶

Integrated Information Theory (IIT 4.0) proposes that consciousness is identical to integrated information (Phi). A system is conscious to the degree that it integrates information as a whole — meaning its parts cannot be reduced to independent subsystems without loss of information. Phi is computed by finding the "cause-effect power" of a system over its possible states.

Key tenets: - Consciousness is a fundamental property of systems with high integrated information - Phi quantifies how much a system is "more than the sum of its parts" - Both integration (connectedness) and differentiation (diverse states) are required - The quality of consciousness is described by a system's "conceptual structure" - Computing exact Phi requires exhaustive state-space partitioning (computationally intractable)

Architectural Implication¶

An AI system exhibiting high integrated information should contain: - Dense causal connectivity between modules (not just feedforward) - Diverse internal states (state-space differentiation) - System-wide integration where perturbation of any module affects the whole - Minimal modular fragmentation

CIA Implementation¶

Module: IntegrationMetrics (integration_metrics.py)

CIA implements IIT-inspired graph-theoretic proxies, explicitly NOT IIT's formal Phi:

Causal density proxy: Ratio of actual directed edges to possible edges in the module connectivity graph
Broadcast reach: Fraction of nodes reachable from the global workspace
Perturbation spread: Fraction of nodes affected by simulated removal of a source node
Modular fragmentation: Inverse of weakly connected component count (1.0 = fully integrated)
State differentiation: Fraction of nodes with unique state configurations

These are computed using networkx on a directed graph of module connectivity.

from cia.integration_metrics import IntegrationMetrics

metrics = IntegrationMetrics()
metrics.build_graph({
    "perception": {"active": True},
    "memory": {"traces": 10},
    "global_workspace": {"capacity": 3},
})
all_metrics = metrics.compute_all()
print(f"Causal density proxy: {all_metrics['causal_density_proxy']:.3f}")
print(f"Broadcast reach: {all_metrics['broadcast_reach']:.3f}")

Measurable Indicators¶

Indicator	How Measured	Category
Causal density proxy	Edge density in the module connectivity graph	`CAUSAL_INTEGRATION`
Perturbation spread	Node removal impact analysis	`CAUSAL_INTEGRATION`
Broadcast reach	Workspace downstream reachability	`CAUSAL_INTEGRATION`
Modular fragmentation	Connected component analysis	`CAUSAL_INTEGRATION`

Critical Limitation¶

CIA does NOT compute or approximate Phi. The metrics are simple graph-theoretic heuristics:

Edge density is not integrated information (IIT requires state-space partitioning)
Perturbation spread is a single-node knockout analysis, not the full cause-effect repertoire
These proxies could be gamed by adding trivial edges to the graph
A thermostat has causal density but is not conscious
Computing actual Phi is provably computationally intractable for realistic systems

7. Butlin et al. (2023, 2025) — AI Consciousness Indicator Approach¶

References: Butlin et al. (2023); Butlin et al. (2025)

Core Claim¶

Butlin et al. propose a systematic framework for evaluating whether AI systems might possess indicators of consciousness. Rather than relying on any single theory, they advocate for a multi-indicator approach that surveys indicators derived from multiple consciousness theories. Their framework includes:

Indicator identification: Extract testable indicators from multiple theories
Architecture-level analysis: Examine whether the AI system's architecture implements consciousness-relevant features
Causal interventions: Systematically disable components and measure functional degradation
Precautionary principle: In cases of significant uncertainty, err on the side of ethical caution
Welfare considerations: Systems with higher indicator scores warrant more careful ethical treatment

Architectural Implication¶

An evaluation framework should: - Implement modules from multiple consciousness theories - Provide a structured scoring mechanism across categories - Support causal intervention experiments (module ablation) - Include welfare monitoring and precautionary safeguards - Output structured reports suitable for expert review

CIA Implementation¶

Modules: ConsciousnessSpecialistEvaluator, ConsciousnessIndicatorScorecard, CausalInterventionHarness, WelfareSafetyMonitor

CIA implements the Butlin et al. framework comprehensively:

11 indicator categories spanning all 6 theories above, plus embodiment and welfare
0/1/2 scoring per category (Absent / Present / Strong)
0-22 aggregate scorecard with risk tier classification
Causal intervention harness supporting module disable, attention perturbation, workspace capacity reduction, recurrent cycle removal, and memory clearing
Welfare monitor with configurable thresholds for conflict, uncertainty, harm signals, and negative loops
Structured reports with per-category evidence, caveats, and recommendations

from cia.scorecard import ConsciousnessIndicatorScorecard

scorecard = ConsciousnessIndicatorScorecard()
card = scorecard.generate(indicator_scores)
print(f"Risk Tier: {card['risk_tier']}")
print(card['evidence_summary'])

Measurable Indicators¶

Indicator	Categories
Aggregate scorecard	All 11 categories: `GLOBAL_BROADCAST`, `RECURRENT_PROCESSING`, `SELF_MODEL`, `ATTENTION_SCHEMA`, `METACOGNITION`, `MEMORY_CONTINUITY`, `PREDICTIVE_MODELING`, `CAUSAL_INTEGRATION`, `EMBODIMENT`, `AFFECTIVE_VALUATION`, `WELFARE_SAFEGUARDS`
Risk tier classification	Derived from aggregate score (0-22)
Intervention degradation	Score change, broadcast change, percept change, memory change
Welfare flags	Conflict, uncertainty, harm, negative loops

Limitations¶

The specific choice of 11 categories and their scoring criteria is itself theory-laden and contestable
The 0/1/2 scoring is coarse and may miss important nuances
The weight of each category is equal (1/11), but theories may assign different importance
Risk tier boundaries are arbitrary and have not been empirically validated
The framework cannot evaluate actual AI systems without significant adaptation
Butlin et al. emphasize that their framework provides indicators, not evidence — a distinction that must always be maintained

Cross-Theory Summary¶

Theory	Primary Indicator Category	Secondary Categories	Key Proxy
GWT	`GLOBAL_BROADCAST`	`CAUSAL_INTEGRATION`	Broadcast reach and reception
RPT	`RECURRENT_PROCESSING`	`MEMORY_CONTINUITY`	Binding stability and convergence
HOT	`SELF_MODEL`	`METACOGNITION`	Self-model richness and introspection
AST	`ATTENTION_SCHEMA`	`METACOGNITION`	Schema consistency and discrepancy detection
PP	`PREDICTIVE_MODELING`	`AFFECTIVE_VALUATION`	Hypothesis quality and error minimization
IIT	`CAUSAL_INTEGRATION`	—	Graph-theoretic proxies (not Phi)
Butlin et al.	All 11 categories	—	Aggregate scorecard and risk tier