Skip to content

15 - OpenNeuro and BIDS Usage

Using Public EEG Datasets with the CIA Neuroadaptive Extension


SCIENTIFIC BOUNDARY: EEG does not read thoughts directly. EEG does not capture or transfer consciousness. OpenNeuro datasets contain scalp electrical activity recordings for research purposes only. This repository does not automatically download any data — all downloads are performed manually by the researcher.


1. What Is OpenNeuro?

OpenNeuro (https://openneuro.org) is a free and open platform for sharing neuroimaging data, primarily EEG, MEG, and MRI datasets. It is maintained by the Stanford Center for Reproducible Neuroscience and the scientific community, and provides a centralized repository where researchers can share their datasets under open-access or restricted-access licenses. OpenNeuro hosts hundreds of datasets across a wide range of experimental paradigms, populations, and imaging modalities.

Key Features

  • Public data access: Many datasets are available for immediate download without registration. Some restricted datasets require the user to accept data use terms.
  • BIDS format: OpenNeuro requires all uploaded datasets to conform to the Brain Imaging Data Structure (BIDS) standard, ensuring consistent organization and metadata.
  • Dataset metadata: Each dataset includes a description, contributors, licensing information, task descriptions, and subject demographics (where available).
  • Versioning: Datasets are versioned, allowing researchers to cite specific versions for reproducibility.
  • CLI tool: OpenNeuro provides a command-line client (openneuro-py) for programmatic dataset download and management.
  • REST API: For advanced users, OpenNeuro provides a REST API for dataset browsing and download.

Relevance to CIA

OpenNeuro is a valuable source of real EEG data for testing and validating the CIA neuroadaptive extension. Instead of relying solely on synthetic data (the MockSignalStream), researchers can download real EEG recordings and process them through the full ingestion, preprocessing, feature extraction, and neural-state encoding pipeline. This enables more realistic evaluation of the proxy estimation heuristics and provides data for exploring individual variability across subjects and tasks.


2. What Is BIDS / EEG-BIDS?

The Brain Imaging Data Structure (BIDS) is a community-developed standard for organizing and describing neuroimaging datasets (https://bids.neuroimaging.io). BIDS was originally developed for MRI data and has been extended to support EEG (EEG-BIDS), MEG, iEEG, and other modalities. The standard specifies a hierarchical directory structure, file naming conventions, and required metadata files (JSON sidecars, TSV files).

BIDS Directory Structure

A BIDS-compliant EEG dataset follows this general structure:

dataset-name/
├── dataset_description.json    # Required: describes the dataset
├── participants.tsv             # Required: participant demographics
├── participants.json           # Optional: column descriptions
├── README                      # Recommended: dataset overview
├── CHANGES                     # Optional: version history
├── LICENSE                     # Optional: license information
├── code/                       # Optional: analysis code
├── derivatives/                # Optional: processed data
└── sub-01/                     # Subject directory
    ├── sub-01_sessions.tsv     # Optional: session metadata
    ├── anat/                   # Optional: anatomical data
    └── ses-01/                 # Session directory
        └── eeg/                # EEG data directory
            ├── sub-01_ses-01_task-rest_eeg.vhdr    # BrainVision header
            ├── sub-01_ses-01_task-rest_eeg.vmrk    # BrainVision marker
            ├── sub-01_ses-01_task-rest_eeg.dat     # BrainVision data
            ├── sub-01_ses-01_task-rest_events.tsv  # Event markers
            ├── sub-01_ses-01_task-rest_channels.tsv# Channel info
            ├── sub-01_ses-01_task-rest_electrodes.tsv # Electrode positions
            ├── sub-01_ses-01_task-rest_eeg.json    # EEG-specific metadata
            └── sub-01_ses-01_task-rest_coordsystem.json # Coordinate system

BIDS File Naming Convention

BIDS file names follow a strict pattern with entity-label pairs separated by underscores:

sub-<label>[_ses-<label>][_task-<label>][_acq-<label>][_run-<label>][_proc-<label>][_space-<label>]_<suffix>.<extension>

Key entities for EEG: - sub-: Subject identifier (required) - ses-: Session identifier (optional) - task-: Task name (strongly recommended) - acq-: Acquisition parameter variant (optional) - run-: Run index (optional) - proc-: Processing pipeline label (optional, typically in derivatives)

Common suffixes: eeg, events, channels, electrodes, coordsystem

EEG-BIDS Specific Files

File Format Purpose
*_eeg.json JSON EEG-specific metadata (sampling rate, reference, filtering)
*_events.tsv TSV Event markers (onset, duration, trial_type, value)
*_channels.tsv TSV Channel information (name, type, units, sampling_rate)
*_electrodes.tsv TSV Electrode positions (name, x, y, z)
*_coordsystem.json JSON Coordinate system definition (EEGLAB, standard_1005, etc.)

Why BIDS Matters for CIA

BIDS compliance ensures that EEG datasets are: - Self-describing: All metadata (sampling rate, channel names, task labels, event markers) is embedded in standardized sidecar files rather than scattered across READMEs and code. - Tool-interoperable: BIDS datasets can be read by MNE-Python, MNE-BIDS, EEGLAB, FieldTrip, BrainStorm, and many other tools without custom parsing code. - Shareable: BIDS datasets can be uploaded to OpenNeuro and other platforms for sharing and reproducibility. - Machine-readable: Automated tools (including cia.neuro.bids_utils.BIDSUtils) can validate, summarize, and navigate BIDS datasets.


3. Using the OpenNeuro CLI Externally

The CIA repository does NOT include automatic download functionality. All data downloads must be performed manually by the researcher. The OpenNeuro CLI (openneuro-py) is the recommended tool for downloading datasets from OpenNeuro.

Installing the OpenNeuro CLI

pip install openneuro-py

Downloading a Dataset

# Download entire dataset
openneuro download ds000001 ./datasets/ds000001

# Download specific subject
openneuro download ds000001 ./datasets/ds000001 --include sub-01

# Download specific task
openneuro download ds000001 ./datasets/ds000001 --include task-rest

# Download with target format (convert on download)
openneuro download ds000001 ./datasets/ds000001 --target BIDS

Alternative Download Methods

  • Direct download: Visit https://openneuro.org/datasets/ds000001 in a browser and download individual files or the entire dataset.
  • wget/curl: For datasets with direct download links.
  • AWS CLI: Some OpenNeuro datasets are available via AWS S3.

Verifying the Download

After downloading, verify that the dataset structure is BIDS-compliant:

# Check directory structure
ls -R ./datasets/ds000001

# Verify dataset_description.json exists
cat ./datasets/ds000001/dataset_description.json

# Check for subject directories
ls ./datasets/ds000001/sub-*/

Licensing Considerations

Before using any OpenNeuro dataset, review its licensing terms. Common licenses include:

License Requirements
CC0 / Public Domain No restrictions; cite the dataset
CC-BY Credit the authors; share-alike
CC-BY-NC Non-commercial use only
Restricted Requires data use agreement acceptance

Some datasets may require the user to accept data use terms on the OpenNeuro website before download is permitted. Always check the dataset_description.json for license information and the OpenNeuro dataset page for any additional requirements.


4. Placing Downloaded Data Under datasets/

The recommended directory structure for locally downloaded EEG datasets is:

consciousness-indicator-architecture/
├── datasets/                    # Local EEG datasets (NOT in git)
│   ├── ds000001/               # OpenNeuro dataset
│   │   ├── dataset_description.json
│   │   ├── participants.tsv
│   │   ├── sub-01/
│   │   ├── sub-02/
│   │   └── ...
│   ├── ds003405/               # Another dataset
│   │   └── ...
│   └── my-local-recording/     # Your own BIDS-formatted data
│       └── ...
├── src/
├── docs/
├── tests/
└── ...

Best Practices

  1. Keep datasets outside git: EEG data files are typically large (tens to hundreds of MB). Add datasets/ to .gitignore to prevent accidental commits.
  2. One dataset per directory: Each dataset should have its own subdirectory under datasets/.
  3. Use dataset IDs as directory names: For OpenNeuro datasets, use the dataset ID (e.g., ds000001) as the directory name for consistency.
  4. Validate after download: Run cia neuro bids-summary on each downloaded dataset to verify its BIDS compliance.
  5. Document the source: Keep a record of where each dataset was obtained, when it was downloaded, and any license or access restrictions.

5. Summarizing Local BIDS-Like Folders

The BIDSUtils class in cia.neuro.bids_utils provides utilities for validating, summarizing, and navigating local BIDS-like EEG dataset folders. No network calls are made by these utilities.

Generating a Summary

The BIDSUtils.summarize_dataset() method produces a comprehensive summary of a BIDS-like folder:

from cia.neuro.bids_utils import BIDSUtils

summary = BIDSUtils.summarize_dataset("datasets/ds000001")

print(f"Dataset path: {summary['path']}")
print(f"Exists: {summary['exists']}")
print(f"Subjects: {summary['subjects']}")
print(f"Tasks: {summary['tasks']}")
print(f"Sessions: {summary['sessions']}")
print(f"Total files: {summary['file_count']}")
print(f"Data files: {summary['data_files']}")
print(f"Valid BIDS: {summary['validation']['valid']}")
print(f"Issues: {summary['validation']['issues']}")

Validating BIDS Structure

The BIDSUtils.validate_bids_like_folder() method checks whether a folder appears BIDS-compliant:

validation = BIDSUtils.validate_bids_like_folder("datasets/ds000001")

print(f"Valid: {validation['valid']}")
print(f"Subject dirs: {validation['structure']['subject_dirs']}")
print(f"Data files: {validation['structure']['data_files']}")
print(f"Sidecars found: {validation['structure']['sidecars_found']}")
for issue in validation['issues']:
    print(f"  Issue: {issue}")

The validation checks for: - Presence of sub-* directories (required) - Presence of recognized data files (.vhdr, .edf, .bdf, .set, .fif, etc.) - Presence of BIDS sidecar files (channels.tsv, *_eeg.json, *_events.json)

Listing Subjects and Tasks

subjects = BIDSUtils.list_subjects("datasets/ds000001")
# Returns: ['sub-01', 'sub-02', 'sub-03', ...]

tasks = BIDSUtils.list_tasks("datasets/ds000001")
# Returns: ['task-rest', 'task-nback', ...]

Using the CLI

The CIA CLI provides a neuro bids-summary command for quick dataset inspection from the command line:

cia neuro bids-summary --path datasets/ds000001

This outputs a human-readable summary of the dataset structure, subjects, tasks, files, and validation status.


6. The Repository Does NOT Download Automatically

This is an important design principle: the CIA repository and its neuroadaptive extension never initiate network requests to download data. All data acquisition is under the explicit control of the researcher.

Why No Automatic Downloads?

  1. Data governance: EEG data is biometric and may be subject to licensing, consent, and data protection requirements. Automatic downloads could violate these constraints.
  2. Reproducibility: Automatic downloads that depend on external services can break when services change. Manual downloads with documented versions are more reproducible.
  3. Transparency: Researchers should explicitly choose which datasets to use and document their provenance.
  4. Bandwidth and storage: EEG datasets can be very large. Automatic downloads could consume significant bandwidth and disk space unexpectedly.

What the Repository Does Provide

The repository provides tools for: - Download guidance: BIDSUtils.generate_openneuro_download_note() produces example CLI commands for downloading specific OpenNeuro datasets. - Local validation: BIDSUtils.validate_bids_like_folder() checks whether a locally downloaded dataset is BIDS-compliant. - Local summarization: BIDSUtils.summarize_dataset() produces a summary of a local dataset. - BIDS ingestion: EEGIngestion.from_bids_folder() reads BIDS-formatted EEG data into the CIA pipeline (requires mne and mne-bids).

Example Download Note

from cia.neuro.bids_utils import BIDSUtils

note = BIDSUtils.generate_openneuro_download_note("ds000001")
print(note)

Output:

OpenNeuro Dataset: ds000001

Download command: openneuro download ds000001 ./datasets/ds000001

Alternative: wget or browser from https://openneuro.org/datasets/ds000001

Place the downloaded dataset in: ./datasets/ds000001

Then run: cia neuro bids-summary --path ./datasets/ds000001

Notes:
  - No automatic downloads are performed by this software.
  - Verify dataset licensing terms before use.
  - OpenNeuro datasets may require acceptance of data use terms.
  - EEG data is biometric; follow institutional data governance policies.

SCIENTIFIC BOUNDARY:
  EEG does not read thoughts directly. EEG does not capture or transfer
  consciousness. This dataset contains scalp electrical activity recordings
  for research purposes only.

7. Example Workflow

This section demonstrates a complete workflow for downloading an OpenNeuro dataset, validating it, and processing it through the CIA neuroadaptive pipeline.

Step 1: Identify a Suitable Dataset

Browse https://openneuro.org and find an EEG dataset that is relevant to your research. Look for: - EEG modality (not MRI or MEG alone) - BIDS format (most OpenNeuro datasets are BIDS-compliant) - Task paradigm that produces measurable cognitive-state variations (resting state, n-back, attention tasks) - License compatible with your intended use

Example: ds000001 is a common test dataset, but you should search for datasets specifically with EEG data. Check the dataset page for a list of included modalities and tasks.

Step 2: Download the Dataset

# Install the OpenNeuro CLI
pip install openneuro-py

# Download the dataset
openneuro download dsXXXXXX ./datasets/dsXXXXXX

# Verify the download
ls ./datasets/dsXXXXXX/dataset_description.json

Step 3: Review Licensing Terms

cat ./datasets/dsXXXXXX/dataset_description.json | python -m json.tool | head -30

Check the License field and any Acknowledgements or ReferencesAndLinks that specify citation requirements.

Step 4: Validate and Summarize

from cia.neuro.bids_utils import BIDSUtils

# Validate
validation = BIDSUtils.validate_bids_like_folder("./datasets/dsXXXXXX")
print(f"Valid: {validation['valid']}")

# Summarize
summary = BIDSUtils.summarize_dataset("./datasets/dsXXXXXX")
print(f"Subjects: {summary['subjects']}")
print(f"Tasks: {summary['tasks']}")
print(f"Files: {summary['file_count']}")

Or via the CLI:

cia neuro bids-summary --path ./datasets/dsXXXXXX

Step 5: Ingest into the Pipeline

from cia.neuro.eeg_ingestion import EEGIngestion

ingestion = EEGIngestion(window_seconds=2.0, step_seconds=1.0)

# Ingest from BIDS folder
metadata, windows = ingestion.from_bids_folder(
    path="./datasets/dsXXXXXX",
    subject="01",          # First subject
    task="rest",           # Resting-state task
)

print(f"Channels: {metadata.channel_names}")
print(f"Duration: {metadata.duration_seconds:.1f}s")
print(f"Windows: {len(windows)}")

Step 6: Preprocess and Extract Features

from cia.neuro.eeg_preprocessing import EEGPreprocessor
from cia.neuro.eeg_feature_extraction import EEGFeatureExtractor
from cia.neuro.neural_state_encoder import NeuralStateEncoder

preprocessor = EEGPreprocessor()
extractor = EEGFeatureExtractor()
encoder = NeuralStateEncoder()

# Process each window
for window in windows[:5]:  # Process first 5 windows as example
    preprocessed = preprocessor.clean(window)
    features = extractor.extract(preprocessed)
    state = encoder.encode(features)

    print(f"Window {window.start_time_seconds:.1f}s: "
          f"attention={state.attention_proxy:.3f}, "
          f"fatigue={state.fatigue_proxy:.3f}, "
          f"quality={features.signal_quality:.3f}")

Step 7: Build a Subject Profile

from cia.neuro.subject_profile import SubjectProfileManager

profile_manager = SubjectProfileManager()
profile = profile_manager.create_profile(subject_id="sub-01")

# Process multiple windows and update the profile
all_features = []
for window in windows[:20]:
    preprocessed = preprocessor.clean(window)
    features = extractor.extract(preprocessed)
    state = encoder.encode(features)
    all_features.append(features)
    profile_manager.update_profile(profile, features, state)

# Summarize the profile
print(profile_manager.summarize_profile(profile))

# Save the profile
profile_manager.save_profile("profiles/sub-01_profile.json", profile)

Step 8: Run Neuroadaptive Conditioning

from cia.neuro.neuroadaptive_conditioning import NeuroadaptiveConditioner
from cia.simulation import CombinedConsciousnessIndicatorSystem

# Load profile
profile = profile_manager.load_profile("profiles/sub-01_profile.json")

# Create system and conditioner
system = CombinedConsciousnessIndicatorSystem()
conditioner = NeuroadaptiveConditioner()

# Get a neural state estimate (from a new window)
preprocessed = preprocessor.clean(windows[25])
features = extractor.extract(preprocessed)
state = encoder.encode(features)

# Convert to control signal (with individual calibration)
signal = conditioner.convert_state_to_control_signal(state, subject_profile=profile)

print(f"Attention bias: {signal.attention_bias:.4f}")
print(f"Workspace priority: {signal.workspace_priority_bias:.4f}")
print(f"Uncertainty adj: {signal.uncertainty_adjustment:.4f}")
print(f"Prediction sens: {signal.prediction_sensitivity_adjustment:.4f}")

# Apply to system components
conditioner.apply_to_attention_controller(system.attention_controller, signal)
conditioner.apply_to_global_workspace(system.global_workspace, signal)

# Run a cognitive cycle with the conditioned system
report = system.run_cycle("Process the sensory input from the current environment.")

# Restore original parameters
conditioner.restore_attention_controller_weights(system.attention_controller)

References

Resource URL / Citation
OpenNeuro https://openneuro.org
BIDS Specification https://bids.neuroimaging.io
BIDS EEG Extension https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/05-electroencephalography.html
MNE-BIDS https://mne.tools/mne-bids/
openneuro-py CLI https://github.com/openneuroorg/openneuro-py
MNE-Python https://mne.tools