18 August, 2008

GWAS Data in Ensembl

Ensembl has begun to incorporate data from genome-wide association studies. These data are being added in coordination with the European Genotype Archive, a new database resource at the EBI designed to provide a permanent archive for human variation data that is not available for unlimited public release because of ethical or individual privacy restrictions. The European Genotype Archive has recently launched with the raw data from the Wellcome Trust Case Control Consortium (WTCCC. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661-678). In the future the EGA will provide additional array-based genotype data as well as data from re-sequencing and CNV studies. The EGA will also contain phenotype data.

Ensembl is incorporating summary data from genome-wide association studies represented in the EGA. The data generally represent the p-value for each of the tested SNP (Single Nucleotide Polymorphism) associated with the given phenotype.

The WTCCC summary data is now available on Ensembl as DAS tracks selectable from the "DAS Sources" menu from the CytoView and ContigView pages. The following menu items provide access to data from biopolar disorder (BD), coronary artery disease (CAD), cardiovascular disease (CD), hypertension (HT), type 1 diabetes (T1D), type 2 diabetes (T2D):

WTCCC BD
WTCCC CAD
WTCCC CD
WTCCC HT
WTCCC T1D
WTCCC T2D

In future releases, GWAS data will be integrated into the Ensembl variation databases.

We will be adding additional data to both Ensembl and the European Genotype Archive as the data become available. We hope you find these new data resources useful.

1 comment:

Paul Flicek said...

As a follow for anyone currently looking for this data in Ensembl.

We removed (I hope temporarily) the WTCCC data at the request of the Wellcome Trust. This was in response to the development of a method that allows for some identifiablity of the individuals in GWAS studies from even summary data. The method was published in PLoS Genetics on 29 August, but we had advance notice and so pulled the data before that.

The UCSC Genome Browser has also pulled their GWAS data pending appropriate review.

If you have a GenomeWeb login, more information about the response at the NIH and other places is here: http://www.genomeweb.com/issues/news/149097-1.html

The paper describing the method is here: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000167