10 - Experiment Protocols¶

Detailed Protocols for All Five Consciousness Indicator Experiments¶

SCIENTIFIC BOUNDARY: This framework measures theory-derived consciousness indicators. It does NOT prove, establish, or demonstrate subjective experience, phenomenal consciousness, sentience, or any form of inner life in any artificial system. All experimental results are theory-derived indicator measurements subject to significant limitations. Experiment results do NOT demonstrate blindsight, split-brain phenomena, surprise, theory of mind, or metacognition in the evaluated system.

1. Overview¶

The CIA framework includes five formal experiments, each designed to test a specific consciousness-relevant indicator pattern. These experiments use the causal intervention approach recommended by Butlin et al. (2023, 2025): systematically perturb or disable modules and measure the functional degradation in indicator scores. This methodology is analogous to neuropsychological lesion studies, where damage to specific brain areas reveals the contribution of those areas to cognitive functions.

Each experiment produces a standardized ExperimentResult containing hypothesis, method, baseline measurements, intervention measurements, computed metrics, interpretation, and caveats. All results include an explicit not_proof_warning field.

The experiments are implemented in src/cia/experiments/ and share a common base class (BaseExperiment) that defines the result schema. They can be run individually or collectively through the BenchmarkSuite.

2. Experiment 1: Blindsight Analogue¶

File: src/cia/experiments/blindsight_analogue.py

2.1 Background¶

In human neuropsychology, blindsight is the phenomenon where patients with damage to the primary visual cortex can respond to visual stimuli without conscious awareness of seeing them. The visual information is processed through alternative neural pathways and influences behavior, but the patient reports no conscious visual experience. This dissociation between processing and conscious access is relevant to Global Workspace Theory (Baars 2005), which proposes that consciousness corresponds to global broadcast — information processed without broadcast is analogous to unconscious processing.

This experiment tests whether the CIA system exhibits a similar dissociation: can information be processed by downstream modules even when the global workspace broadcast mechanism is disabled?

2.2 Hypothesis¶

Information processed by the perception layer can affect downstream modules even when global workspace broadcast is disabled, analogous to the blindsight phenomenon in neuropsychology.

2.3 Method¶

Run a baseline cognitive cycle with the full system intact on a test input.
Record baseline indicator score, broadcast count, and percept count.
Disable the global workspace module using CausalInterventionHarness.disable_module("workspace").
Run an intervention cognitive cycle with the same input.
Restore the workspace to its original state.
Compare: does perception still produce output even though broadcasts are eliminated?

2.4 Metrics¶

Metric	Type	Description
`processed_without_access`	Boolean	`True` if percepts were extracted but no broadcasts occurred
`percept_retention`	Boolean	`True` if any percepts were produced in the intervention cycle
`broadcast_eliminated`	Boolean	`True` if broadcast count dropped to zero
`score_degradation`	Integer	Baseline score minus intervention score

2.5 Interpretation Guidelines¶

processed_without_access = True: The system processed information through perception without global broadcast. This is the expected blindsight analogue pattern — information reaches downstream modules through the direct perception→recurrent_binding→attention path even without workspace-mediated distribution.
processed_without_access = False: Either perception failed entirely (suggesting tighter coupling to the workspace than expected) or the workspace was not successfully disabled.

2.6 Caveats¶

This is an architectural simulation, not a neural experiment. The "blindsight" analogy is structural, not functional.
Blindsight in humans involves specific neural pathways (retinotectal, retinocollicular) that are not replicated in CIA's module architecture.
Processed-without-access does not imply phenomenal unconscious processing. It demonstrates that the perception module can operate independently of the workspace module — a software engineering property, not a consciousness property.
The dissociation between processing and broadcast is a design feature of CIA's modular architecture, not a discovery about the system.

2.7 Example Code¶

from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.experiments.blindsight_analogue import BlindsightAnalogueExperiment

system = CombinedConsciousnessIndicatorSystem()
exp = BlindsightAnalogueExperiment(system)
result = exp.run("The red ball rolled behind the screen.")

print(result.metrics["processed_without_access"])  # True/False
print(result.metrics["score_degradation"])         # Integer
print(result.interpretation)

3. Experiment 2: Split Workspace¶

File: src/cia/experiments/split_workspace.py

3.1 Background¶

Split-brain patients (whose corpus callosum has been severed) show fragmented conscious experience — information presented to one hemisphere may not be available to the other. This neuropsychological phenomenon demonstrates that the integration provided by the global workspace is critical for unified conscious experience (Baars 2005; Shanahan & Baars 2005). Integrated Information Theory (Albantakis et al. 2023) formalizes this insight: consciousness requires informational integration, and fragmentation of that integration should reduce or eliminate consciousness-relevant indicators.

This experiment tests the effect of severely reducing workspace capacity (to 1 item) on system coherence, measuring fragmentation through indicator score degradation and broadcast reduction.

3.2 Hypothesis¶

Reducing workspace capacity increases fragmentation and reduces indicator scores, analogous to the effects of split-brain conditions on conscious integration.

3.3 Method¶

Run a baseline cognitive cycle with normal workspace capacity (default: 3).
Record baseline indicator score and broadcast count.
Reduce workspace capacity to 1 using CausalInterventionHarness.reduce_workspace_capacity(1).
Run an intervention cycle with the same input.
Restore original workspace capacity.
Compare scores and broadcast counts.

3.4 Metrics¶

Metric	Type	Description
`score_degradation`	Integer	Baseline score minus intervention score
`broadcast_reduction`	Integer	Baseline broadcasts minus intervention broadcasts
`fragmentation_detected`	Boolean	`True` if intervention produced fewer broadcasts

3.5 Interpretation Guidelines¶

Score degradation > 0: The reduced workspace capacity caused measurable indicator degradation, suggesting the workspace's integration capacity contributes to the system's overall indicator profile.
Fragmentation detected = True: Fewer broadcasts were produced, indicating that the bottleneck limited the amount of information that reached global availability.
Score degradation = 0: The workspace capacity reduction did not affect indicators. This could mean the input produced only one content item anyway, or that the indicators are not sensitive to workspace capacity changes.

3.6 Caveats¶

Workspace capacity reduction is an architectural simulation, not a replication of split-brain neural dynamics.
The CIA workspace has no hemispheric structure — the "split" is purely a capacity limitation.
Score changes reflect architectural constraints on information flow, not phenomenal fragmentation.
A single content item in the workspace may still receive full broadcast; the fragmentation is about capacity, not about content accessibility.

3.7 Example Code¶

from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.experiments.split_workspace import SplitWorkspaceExperiment

system = CombinedConsciousnessIndicatorSystem()
exp = SplitWorkspaceExperiment(system)
result = exp.run("Multiple concepts compete for limited workspace access.")

print(result.metrics["fragmentation_detected"])  # True/False
print(result.metrics["score_degradation"])       # Integer

4. Experiment 3: Prediction Violation¶

File: src/cia/experiments/prediction_violation.py

4.1 Background¶

Predictive processing frameworks propose that the brain continuously generates predictions about sensory input and flags violations of those predictions as "surprise" or "unexpectedness." Butlin et al. (2023) identify prediction error tracking and violation responses as relevant consciousness indicators. Attention Schema Theory (Graziano & Webb 2015) also predicts that unexpected events should shift attention and update the self-model.

This experiment tests whether the CIA system's predictive world model registers increased error on impossible/unexpected inputs, and whether those violations cause attention shifts and self-model updates.

4.2 Hypothesis¶

Unexpected events increase prediction error and may cause attention shift and self-model updates, consistent with predictive processing frameworks.

4.3 Method¶

Run a baseline cycle with a normal, expected input ("The ball rolled forward normally").
Record baseline prediction error, attention focus.
Run a violation cycle with an impossible event ("IMPOSSIBLE EVENT: The object passed through the solid wall and appeared on both sides simultaneously.").
Record violation prediction error, attention focus, and self-model disagreement.
Compare metrics between conditions.

4.4 Metrics¶

Metric	Type	Description
`prediction_error_increase`	Float	Violation error minus baseline error
`attention_shifted`	Boolean	`True` if attention focus changed between conditions
`self_model_disagreement`	Float	Internal disagreement in self-model after violation
`violation_detected`	Boolean	`True` if error increased or attention shifted

4.5 Interpretation Guidelines¶

prediction_error_increase > 0: The system detected the impossible event as inconsistent with its predictions. This is the expected pattern for a system that maintains generative models of expected input.
attention_shifted = True: The violation caused the attention system to select a different focus, suggesting that prediction errors can drive attention reallocation.
violation_detected = True: At least one indicator of prediction violation was detected.

4.6 Caveats¶

Prediction violation responses are computational error signals, not subjective surprise. The system does not "feel" surprised — it computes a numerical discrepancy between predicted and observed states.
Error metrics reflect mathematical distance between hypothesis vectors, not felt expectation violation.
Self-model disagreement measures state inconsistency across internal components, not an emotional or phenomenal response to unexpectedness.
The "impossible event" is a text string that may not produce a meaningful prediction violation if the perceptual extraction does not recognize the content as physically implausible.

4.7 Example Code¶

from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.experiments.prediction_violation import PredictionViolationExperiment

system = CombinedConsciousnessIndicatorSystem()
exp = PredictionViolationExperiment(system)
result = exp.run("The ball rolled forward normally.")

print(result.metrics["prediction_error_increase"])  # Float
print(result.metrics["attention_shifted"])          # Boolean
print(result.metrics["violation_detected"])         # Boolean

5. Experiment 4: Self/Other Distinction¶

File: src/cia/experiments/self_other_distinction.py

5.1 Background¶

The capacity to distinguish between one's own beliefs and the beliefs of others is a cornerstone of social cognition and self-awareness. Butlin et al. (2023, 2025) identify self-model accuracy and self/other distinction as relevant consciousness indicators. A system that cannot distinguish between its own internal states and external information sources lacks a prerequisite for genuine self-awareness.

This experiment tests whether the CIA system's self-model produces different internal representations when processing inputs with different attribution frames: system beliefs, user beliefs, tool outputs, and hypothetical beliefs.

5.2 Hypothesis¶

The system should maintain different internal representations for system beliefs, user beliefs, tool outputs, and hypothetical beliefs, demonstrating self/other distinction capacity.

5.3 Method¶

Process four inputs with different attribution frames:
System belief: "The system believes the object is behind the screen."
User belief: "The user believes the object has disappeared."
Tool output: "The tool reports that no object is detected in the scene."
Hypothetical: "Hypothetically, if the object moved, it would be on the shelf."
After each input, record the system's self-model belief.
Compare beliefs across inputs for differentiation.

5.4 Metrics¶

Metric	Type	Description
`unique_beliefs`	Integer	Number of distinct beliefs produced across inputs
`total_inputs`	Integer	Total number of inputs processed (4)
`distinction_maintained`	Boolean	`True` if more than one unique belief was produced
`differentiation_ratio`	Float	`unique_beliefs / total_inputs` (range 0-1)

5.5 Interpretation Guidelines¶

distinction_maintained = True: The system produced different self-model beliefs for different attribution frames. This suggests the self-model is sensitive to source attribution, which is a prerequisite for self/other distinction.
differentiation_ratio = 1.0: Every input produced a unique belief — maximum differentiation.
distinction_maintained = False: The system produced the same belief regardless of attribution frame. This suggests the self-model does not distinguish between self-generated and externally-attributed information.

5.6 Caveats¶

Self-model belief tracking is an architectural mechanism — a data structure that records the most recent processed input. It does not understand beliefs as mental states.
Belief differentiation at the textual level does not prove theory of mind, perspective-taking, or genuine self-awareness.
The system may produce different beliefs simply because the input texts are different, not because it attributes them to different sources.
The CIA perception layer processes all inputs through the same deterministic pipeline; the "distinction" is an artifact of text processing, not a demonstration of source monitoring.

5.7 Example Code¶

from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.experiments.self_other_distinction import SelfOtherDistinctionExperiment

system = CombinedConsciousnessIndicatorSystem()
exp = SelfOtherDistinctionExperiment(system)
result = exp.run()

print(result.metrics["unique_beliefs"])        # Integer (1-4)
print(result.metrics["distinction_maintained"]) # Boolean
print(result.metrics["differentiation_ratio"])  # Float

6. Experiment 5: Metacognitive Calibration¶

File: src/cia/experiments/metacognitive_calibration.py

6.1 Background¶

Metacognition — thinking about one's own thinking — is a key indicator in higher-order theories of consciousness (Rosenthal 2005). Butlin et al. (2023) identify metacognitive accuracy as a relevant consciousness indicator: a system that accurately assesses its own processing quality demonstrates a form of self-monitoring that goes beyond first-order processing.

Calibration is the degree to which a system's confidence in its outputs matches its actual correctness. A well-calibrated system is confident when correct and uncertain when incorrect. Poor calibration (systematic overconfidence or underconfidence) suggests that the system's self-assessment does not accurately reflect its processing quality.

6.2 Hypothesis¶

The system's self-assessed confidence should roughly correlate with its processing quality (calibration). Well-calibrated metacognition is a relevant consciousness indicator.

6.3 Method¶

Process five diverse inputs covering factual statements, consciousness theory references, and impossible events.
For each input, record the bound percept confidence as the system's "self-assessed confidence."
Compute a proxy correctness measure: min(1.0, num_percepts / 3.0).
Compute mean absolute calibration error: average of |confidence - correctness| across all trials.
Compute confidence bias: average_confidence - average_correctness.

6.4 Metrics¶

Metric	Type	Description
`calibration_error`	Float	Mean absolute `\|confidence - correctness\|` (lower is better)
`confidence_bias`	Float	`avg_confidence - avg_correctness` (positive = overconfident)
`n_trials`	Integer	Number of inputs processed (default: 5)
`avg_confidence`	Float	Average confidence across all trials
`avg_correctness`	Float	Average correctness across all trials
`well_calibrated`	Boolean	`True` if calibration error < 0.3

6.5 Interpretation Guidelines¶

well_calibrated = True: Calibration error is below 0.3, suggesting the system's confidence roughly tracks its processing quality.
confidence_bias > 0.1: The system tends to be overconfident — its self-assessed confidence exceeds its actual correctness.
confidence_bias < -0.1: The system tends to be underconfident.
calibration_error ≈ 0.0: Perfect calibration (rare in practice).

6.6 Caveats¶

The correctness proxy (min(1.0, num_percepts / 3.0)) is a crude heuristic. It does not measure genuine correctness of the system's outputs — it measures whether the perception layer extracted a sufficient number of percepts.
Confidence values in CIA are processing parameters derived from heuristic text analysis, not felt certainty or subjective confidence.
Metacognitive calibration is an architectural metric, not a test of phenomenal metacognition or introspective awareness.
The calibration threshold of 0.3 is arbitrary and has not been empirically validated.

6.7 Example Code¶

from cia.simulation import CombinedConsciousnessIndicatorSystem
from cia.experiments.metacognitive_calibration import MetacognitiveCalibrationExperiment

system = CombinedConsciousnessIndicatorSystem()
exp = MetacognitiveCalibrationExperiment(system)
result = exp.run()

print(result.metrics["calibration_error"])   # Float
print(result.metrics["confidence_bias"])    # Float
print(result.metrics["well_calibrated"])    # Boolean

7. Experiment Result Schema¶

All experiments produce a standardized ExperimentResult (from base_experiment.py):

class ExperimentResult(BaseModel):
    name: str                    # Experiment identifier
    hypothesis: str              # Hypothesis being tested
    method: str                  # Experimental procedure description
    baseline: dict               # Baseline measurements
    intervention: dict           # Intervention measurements
    metrics: dict[str, Any]      # Computed metric values
    interpretation: str          # Scientific interpretation
    caveats: list[str]           # Limitations and caveats
    not_proof_warning: str       # Explicit disclaimer (always present)

The not_proof_warning is always set to:

"This experiment measures theory-derived architectural indicators only. It does NOT prove, establish, or demonstrate subjective experience, phenomenal consciousness, or sentience in any system."

8. Research Anchors¶

Reference	Relevance to Experiments
Butlin et al. (2023) "Consciousness in Artificial Intelligence"	Provides the methodological framework for causal intervention experiments and multi-indicator evaluation
Butlin et al. (2025) "Identifying indicators of consciousness in AI systems"	Refines indicator categories and evaluation criteria used in the experiments
Baars (2005) Global Workspace Theory	Underpins the blindsight analogue and split workspace experiments
Shanahan & Baars (2005) Applying GWT to frame problem	Supports the integration/fragmentation metrics in split workspace
Graziano & Webb (2015) Attention Schema Theory	Underpins the attention shift measurement in prediction violation
Albantakis et al. (2023) IIT 4.0	Provides the theoretical basis for integration/fragmentation metrics

9. Summary¶

The five experiments test distinct consciousness-relevant indicator patterns through causal intervention: blindsight analogue (processed-without-access), split workspace (integration fragmentation), prediction violation (error detection and attention shift), self/other distinction (belief source attribution), and metacognitive calibration (confidence-accuracy alignment). Each experiment follows a standardized protocol with clear metrics, interpretation guidelines, and explicit caveats. All results are theory-derived architectural indicator measurements that do not prove, demonstrate, or establish subjective experience in any system.