28 January, 2011

New genebuild summaries now available

We are pleased to announce new documentation, specific for describing the gene annotation methodology and results for particular species.

Ensembl gene annotation is a multi-step process which usually takes several months to complete for one species, and is termed the genebuild. In order to provide our users with more information on the data resources used and decisions made during the genebuilding process, we are introducing a new genebuild summary PDF document for each new genebuild, starting from early February 2011 with Ensembl release 61. Each document includes details on not only the alignment programs and data filtering parameters used, but also statistics on the number of protein/cDNA/EST sequences used at different stages of the genebuild. For example, users will be able to find out how many protein sequences were retrieved from public repositories (RefSeq and UniProt) at the beginning of the genebuilding process, how many of these proteins aligned to the genome by various algorithms at different stages of the build, and how many remain in the final gene set as supporting evidence for genes. For human, mouse and zebrafish, the process of merging Ensembl and Havana annotations is also explained.

The genebuild summary will be available for six species: the Anole lizard, Marmoset, Mouse, Panda, Turkey and Zebrafish. More genebuild summaries will be available in the future when genebuilds of existing species are being updated, or when new species are being annotated. You can download the document via a link found near the bottom of the "Description" page for each species. Just click on the species of interest from the home page, to open its description page.

No comments: