RegulomeDB Help

How do I submit my data?

Data can be input into the box or uploaded from a file on your desktop in the following file formats:

dbSNP IDs
0-based coordinates (as chr#[tab]min_coord[tab] max_coord or in a BED or VCF file format)
1-based coordinates (as chr#:min_coord..max_coord or in a GFF3 file format)
BED format (View file format specifications)
VCF format (View file format specifications)
GFF3 format (View file format specifications)

What is displayed on the summary of SNP analysis page?

A summary of the total number of rows analyzed and coordinates searched will be displayed in addition to any errors located in the file. The rest of the page includes the nucleotides entered in the query and the data associated with the nucleotides. The table contains the following columns of data:

dbSNP ids
dbSNP ID: If available, the dbSNP id for that coordinate is displayed.
1-based coordinates (as chr#:min_coord..max_coord or in a GFF3 file format)
RegulomeDB Score: This is a computed score based on the integration of multiple high-throughput datasets. Additional details are described in the next question.
Other Resources: links to external resources that provide additional information for the genomic region or dbSNP are provided.

What does the RegulomeDB score represent?

The scoring scheme refers to the following available datatypes for a single coordinate.

Score	Supporting data
1a	eQTL + TF binding + matched TF motif + matched DNase Footprint + DNase peak
1b	eQTL + TF binding + any motif + DNase Footprint + DNase peak
1c	eQTL + TF binding + matched TF motif + DNase peak
1d	eQTL + TF binding + any motif + DNase peak
1e	eQTL + TF binding + matched TF motif
1f	eQTL + TF binding / DNase peak
2a	TF binding + matched TF motif + matched DNase Footprint + DNase peak
2b	TF binding + any motif + DNase Footprint + DNase peak
2c	TF binding + matched TF motif + DNase peak
3a	TF binding + any motif + DNase peak
3b	TF binding + matched TF motif
4	TF binding + DNase peak
5	TF binding or DNase peak
6	other

What details are provided for the datatypes supporting a SNP?

This page lists all the DNA features and regulatory regions that have been identified to contain the input coordinate.

Transcription factor binding sites
Position-Weight Matrix for TF binding (PWM)
DNase Footprinting
DNase sensitivity
Chromatin States
eQTLs
Differentially methylated regions
Manually curated regions
Validated functional SNPs

What data is currently available at RegulomeDB?

RegulomeDB currently query the following data types

Transcription factor binding sites
ChIP factors: 740 unique data sets including most recent ENCODE data release (2012 Freeze).
Xie et al. (2013) and Boyle et al. (2014)

Position-Weight Matrix for TF binding (PWM)

JASPAR CORE
TRANSFAC
UniPROBE
Jolma et al.

DNase sensitivity
204 unique datasets including most recent ENCODE data release. ENCODE Project Consortium

Chromatin States
Roadmap Epigenome Consortium 127 standard epigenomes.

eQTLs / dsQTLs
Tissue types:

Cerebellum
Cortex
Fibroblasts
Frontal-Cortex
Liver
Lymphoblastoid
Monocytes
Pons
T-cells
Temporal-Cortex

DNase Footprinting

Differentially Methylated regions
Kuleshov et al.

Manually curated regions

Validated functional SNPs

What version of dbSNP is RegulomeDB querying?

RegulomeDB is currently querying build 141 of dbSNP. See NCBI for additional information about dbSNP141.

What version of the human genome sequence are the data mapped to at RegulomeDB?

All data at RegulomeDB is currently mapped to hg19. Additional information about the human reference genome can be found at the Genome Reference Consortium

Why is there no data for my chromosomal region?

Entering a chromosomal region will identify all common SNPs (with an allele frequency > 1%) in that region. Theses SNPs are used to query the RegulomeDB. If there are no common SNPs in the uploaded genomic regions, there will be no data available. However, the chromosomal region can be uploaded as split single nucleotide values in order to query each nucleotide individually.

Alternatively, the region you entered could be in a protein-coding region of the genome. Currently, RegulomeDB only integrates and curates high-throughput data from non-coding and intergenic regions of the human genome.