22 October, 2009

Ensembl Events in November 2009

October was quite a busy month for the Ensembl Outreach team, but November is even busier:

2 Nov: Developers workshop at the German Cancer Research Center (DKFZ), Heidelberg, Germany
4-10 Nov: Ensembl module in the Computational & Comparative Genomics course, Cold Spring Harbor, NY, US
5-6 Nov: Ensembl Genomes module in the EBI Roadshow at the University of Szeged, Hungary
9 Nov: Browser workshop at the German Cancer Research Center (DKFZ), Heidelberg, Germany
10 Nov: Browser workshop at the German Cancer Research Center (DKFZ), Heidelberg, Germany
12-13 Nov: Browser workshop at the University of Cambridge, Cambridge, UK
12 Nov: Browser workshop at the Jackson Laboratory, Bar Harbor, ME, US
13 Nov: Browser workshop at the Jackson Laboratory, Bar Harbor, ME, US
13 Nov: Presentation at the Jackson Laboratory, Bar Harbor, ME, US
16 Nov: Browser workshop at Harvard Medical School, Boston, MA, US
17-18 Nov: Ensembl module in The Genome Access Course, Cold Spring Harbor, NY, US
18 Nov: Presentation for EBI/EMBL Ph.D. students, Hinxton, UK
24 Nov: Browser workshop at the Centro Nacional de Investigaciones Oncológicas, Madrid, Spain
26-27 Nov: Developers workshop at the Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
26 Nov: Browser workshop at the Royal Veterinary College, London, UK
28 Nov: Browser workshop at CEINGE Biotecnologie Avanzate, Naples, Italy
30 Nov - 1 Dec: Ensembl module in the Wellcome Trust Open Door Workshop - Working with the Human Genome Sequence, Hinxton, UK
30 Nov - 2 Dec: Developers workshop at the University of Cambridge, Cambridge, UK

For details about these and other upcoming events, please have a look at the complete list of Ensembl training events.

19 October, 2009

Aloha ASHG

From this little corner of the world Ensembl will be delivering an Interactive Workshop on Friday (October 23rd) from noon (12.00) in Room 315 in the Convention Center. If you want to attend, let us know as seating is restricted and we are allocating seats until the room is full. You must bring a laptop with a wireless card (and a fully charged battery).

Furthermore, you can also visit us on booth 432 where we will be happy to help you and get any feedback.

08 October, 2009

Ensembl Genomes Release 3

We are pleased to announce the third release of EnsemblGenomes, which includes the first release of two new Ensembl-based portals, Ensembl Plants and Ensembl Fungi.

These complete the span of Ensembl Genomes portals across the taxonomic space, complementing the coverage of vertebrate genomes available through Ensembl.

  • Ensembl Plants has been built in collaboration with Gramene and includes the genomes of six monocots and two dicots. Variation databases are available for four of these species.
  • Ensembl Fungi includes a new build of the Sacchromyces cerevisiae genome using the latest data from SGD, including variation data derived from the Saccharomyces Genome Resequencing Project; and Ensembl databases for Schizosaccharomyces pombe (built in collaboration with GeneDB_Spombe) and eight species of Aspergillus (built in collaboration with the Central Aspergillus Database Repository, CADRE).
  • User upload databases are now operational for Ensembl Protists, Fungi, Plants and Metazoa, allowing users to visualise their own data in the Ensembl environment.
Ensembl Genomes release 3 has been built using Ensembl 55 software. We aim to synchronise with Ensembl with our next release (Ensembl Genomes 4/Ensembl 57), and to stay synchronised thereafter.

01 October, 2009

Genomewide comparative displays

When we changed our look and feel almost a year ago, we "left behind" our two main graphical genome-wide comparative genomics displays (our textual comparative genomics displays remains, as did some of the gene centric ones). These were some of the most complex displays, not only in the graphics layout but also in aspects such as configuration - with comparative genomics tracks with up to 30 species, potentially one has the union of all tracks in each species, and doing this consistently required reworking how we thought about the "same" or "different" tracks across species.

It's taken longer than we thought it would, but finally in release 56 these displays are back and better than ever. With more aggressive caching of data items as they head to the web (and, in addition, if you are on the west coast of the US or the Pacific Rim, check out the US west mirror at uswest.ensembl.org) they go far faster, making them far more useable.

We have two fundamentally different ways of thinking about genomic alignments.

In "Multi Sequence View", which works fundamentally as a set of pairwise alignments, we maintain the linear sequence of each genome, and then draw regions which are conserved between them. Check out displays like:


And make sure you hit "Configure Page" and in the Comparative Genomics section, switch on blastz. I also like to have genes in "Collapsed, labels" (so alternative splicing doesn't produce excessive displays) and also switch on Regulatory Features.

Now - you get a nice picture of this region in human and mouse. The orthologous gene (PECI) has conserved exons, and the regulatory features at the start of this gene is conserved in human and mouse and both cases classified as a promoter. All as expected.

But a closer look shows that the transcript going by the catchy name of AC123437.5 in mouse, going on the opposite strand has some of its exons overlapping to the human PECI, and Human PECI is duplicated into two local genes here. This is perhaps easier to see as one zooms out in this display (notice you can drag-and-select in the upper panels, or use the + and - bars to change in the lower panels)

Zoom Out

In contrast, the alignment (Image) view, asks you to choose one species as the co-linear
reference, and then the other species are organised specifically by the alignment of that
reference. This is ideal in more linear, orthologous regions. I like using the 10-way EPO alignment for visualisation/gene model comparison, although to go things like conservation analysis, you want to use the 31-way mammalian alignment with the low coverage data

This is gene, well conserved across mammals.

Co Linear

We can look at the precisely the same alignment from the perpsective of Mouse, Rat, Dog, Horse, Human, Pig. In each case, the alignment is unbiased to each species. For example, the Mouse-Rat portion of this multiple alignment still aligns the unique rodent portions.

Here is that same region from the perspective of Cow:


Notice when you go to human you have a choice of not only 4 different multiple alignments - a 4-way primate alignment, a 10-way mammalian alignment, a 12-way alignment including chicken and 31-way mammalian alignment, but also 40 odd other individual pairwise alignments.

In each case, you can get the alignment out as text - here's a 4-way primate alignment:

Text alignment

or the same region in a 31-way glory

31 one way text

Of course, all this information is also available to download or access through our Perl API. A particularly interesting thing in these alignments is the ability to switch on the ancestral sequence as well (go to the configuration panel).

More on the use and power of comparative genomics later I hope, but for the moment, do enjoy these displays being back, and do both browse around and download/script against them.