28 May, 2008

Ensembl 50

Ensembl is now busy with preparations for our next release, Ensembl 50! We're working hard and we'll keep you updated on what's in store for this release. Our biggest new development will be our revamped website. As usual, we have updated some species and provided new data for other species. Keep reading for an outline of what we aim to provide in Ensembl 50.

New web interface:
The most exciting change in Ensembl 50 will be a new web interface: Simpler, Better, Faster is what we're aiming for. Not only will pages take less time to load, but they will also look a little different. We're hoping that we will have improved the navigability and discoverability of the site so that you can make the best use possible of the data we provide. We have taken into account your messages at helpdesk and your voices in courses. Let us know what you think by emailing helpdesk@ensembl.org !

Genebuild team:
In terms of new data for Ensembl 50, we have constructed new gene sets for tetraodon and cow. Vega/Havana (manual annotation) has released new gene sets for human and mouse so these will be displayed on our website alongside Ensembl genes.

For human, you may know that Ensembl and Havana merge identical transcripts. We have improved the Vega/Havana merge using the latest Havana gene set. Because untranslated regions are notoriously difficult to determine, we've used ditags when predicting UTRs for human. Finally, we have removed some dodgy-looking gene models that were highlighted by the Alpheus project.

For low-coverage genomes, gene models are predicted by projecting the human gene models down onto the 2x genomes. In this release, cat and pika have been updated by projecting current human gene models onto the existing assembly.

We've also updated the gene sets for C. elegans and chimp. Release notes for C. elegans can be found on the WormBase website. Chimp has an updated gene set to include more chimp-specific predictions, and genes projected from human onto chimp are updated.

The horse genomic assembly (EquCab2) has recently been updated such that chromosome 27 has been shortened. This is not a new genebuild as such, but we have modified our data to reflect this change. Zebrafish Agilent V2 Arrays have been mapped to cDNA and genomic sequences.

Canonical transcripts (the longest translations) have been labeled for all species in the database, though this will not appear in the browser. As usual, non-coding RNA genes have also been updated for most species, and cDNA alignments have been redone for human and mouse.

Variation and Functional Genomics teams:
Our Variation team plans to provide updated single nucleotide polymorphisms (SNPs) for tetraodon, cow, human, chimp and orangutan. Our Functional Genomics team will provide promoter cis-regulatory motifs from here. They will also update the current regulatory build on human.

Comparative Genomics team:
Our Comparative Genomics team is extending their multiple alignments with new species and low-coverage (2x) genomes to include:
* 4-species: catarrhini primates EPO (Enredo-Pecan-Ortheus) alignments (human, chimp, orangutan, macaque )
* 12-species: amniote vertebrates Mercator-Pecan alignments (current 10-species alignments + Pongo pygmaeus and Equus caballus)
* 23-species: eutherian mammals EPO (Enredo-Pecan-Ortheus) alignments (all 2X genomes + current 7-species alignments + Pongo pygmaeus and Equus caballus)

GERP scores (% conservation on a basepair level for the 23-species eutherian mammals alignments) will be released.

The Compara (comparative genomics) team is working hard! They're also providing new pairwise alignments:
* All the pairwise (between two species), whole-genome alignments (using tBLAT) will be updated using a new pipeline that follows a best-in-genome approach to filter spurious hits.
* The pairwise alignments for more closely related species (using BLASTz-net) will be updated for the following species so that the reference species is human:
. human vs Pongo_pygmaeus
. human vs Loxodonta africana
. human vs Echinops telfairi
. human vs Oryctolagus cuniculus
. human vs Dasypus novemcinctus
. human vs Myotis lucifugus
. human vs Bos Taurus
. human vs Ochotona princeps
. human vs Felis catus
Sitewise dN/dS values will be provided in our gene trees to detect positions in the alignments that are under different evolutionary pressure.

Web team:
Last but not least, please note that from Release 50 we will no longer be providing the 'ssaha' sequence search. If you wish to run your own 'ssaha' sequence search you can download the files to generate the search hashes from our FTP site. Alternatively, use BLAT (the BLAST-like Alignment Tool) which is equally fast and also demands exact matches.

That's it for now! Any questions, just email helpdesk@ensembl.org. We will be posting more information as the release date gets closer (we are aiming for end of July!)

No comments: