Download Data

Download the complete ExpressionGenesis dataset as a CSV file for use in your own analyses, pipelines, or research.

All disease annotations are validated against the Disease Ontology. You can reuse these annotations in meta-analyses, benchmarking, machine learning, or data integration workflows.

Terms of Use

The ExpressionGenesis dataset is freely available for academic and commercial use. If you use this data in your research, please cite the ExpressionGenesis article:

Spohn, D. R. (2026). ExpressionGenesis: Automated disease annotation and metadata generation for the Gene Expression Omnibus using large language models. [Manuscript in preparation]. Brandeis University.

See the About page for the full citation and DOI (when available).

📄 Read the manuscript (PDF)

GEO Series to Disease Mapping

This CSV file links Gene Expression Omnibus (GEO) Series accessions to Disease Ontology terms. Each row represents a unique GEO Series & Disease combination. Series without disease tags are excluded.

Schema:

  • gse_id: Accession number of the GEO Series (e.g., GSE123456)
  • disease: Disease name from the Disease Ontology - as identified by large language model
  • doid: Corresponding Disease Ontology ID (e.g., DOID:1234)

Filename:

  • geo_series_to_disease.csv
  • Fetching file metadata...
Download CSV