01 October, 2009

Genomewide comparative displays

When we changed our look and feel almost a year ago, we "left behind" our two main graphical genome-wide comparative genomics displays (our textual comparative genomics displays remains, as did some of the gene centric ones). These were some of the most complex displays, not only in the graphics layout but also in aspects such as configuration - with comparative genomics tracks with up to 30 species, potentially one has the union of all tracks in each species, and doing this consistently required reworking how we thought about the "same" or "different" tracks across species.

It's taken longer than we thought it would, but finally in release 56 these displays are back and better than ever. With more aggressive caching of data items as they head to the web (and, in addition, if you are on the west coast of the US or the Pacific Rim, check out the US west mirror at uswest.ensembl.org) they go far faster, making them far more useable.

We have two fundamentally different ways of thinking about genomic alignments.

In "Multi Sequence View", which works fundamentally as a set of pairwise alignments, we maintain the linear sequence of each genome, and then draw regions which are conserved between them. Check out displays like:


And make sure you hit "Configure Page" and in the Comparative Genomics section, switch on blastz. I also like to have genes in "Collapsed, labels" (so alternative splicing doesn't produce excessive displays) and also switch on Regulatory Features.

Now - you get a nice picture of this region in human and mouse. The orthologous gene (PECI) has conserved exons, and the regulatory features at the start of this gene is conserved in human and mouse and both cases classified as a promoter. All as expected.

But a closer look shows that the transcript going by the catchy name of AC123437.5 in mouse, going on the opposite strand has some of its exons overlapping to the human PECI, and Human PECI is duplicated into two local genes here. This is perhaps easier to see as one zooms out in this display (notice you can drag-and-select in the upper panels, or use the + and - bars to change in the lower panels)

Zoom Out

In contrast, the alignment (Image) view, asks you to choose one species as the co-linear
reference, and then the other species are organised specifically by the alignment of that
reference. This is ideal in more linear, orthologous regions. I like using the 10-way EPO alignment for visualisation/gene model comparison, although to go things like conservation analysis, you want to use the 31-way mammalian alignment with the low coverage data

This is gene, well conserved across mammals.

Co Linear

We can look at the precisely the same alignment from the perpsective of Mouse, Rat, Dog, Horse, Human, Pig. In each case, the alignment is unbiased to each species. For example, the Mouse-Rat portion of this multiple alignment still aligns the unique rodent portions.

Here is that same region from the perspective of Cow:


Notice when you go to human you have a choice of not only 4 different multiple alignments - a 4-way primate alignment, a 10-way mammalian alignment, a 12-way alignment including chicken and 31-way mammalian alignment, but also 40 odd other individual pairwise alignments.

In each case, you can get the alignment out as text - here's a 4-way primate alignment:

Text alignment

or the same region in a 31-way glory

31 one way text

Of course, all this information is also available to download or access through our Perl API. A particularly interesting thing in these alignments is the ability to switch on the ancestral sequence as well (go to the configuration panel).

More on the use and power of comparative genomics later I hope, but for the moment, do enjoy these displays being back, and do both browse around and download/script against them.


No comments: