28 May, 2010

GERP constrained elements via DAS

It is now possible to get the GERP constrained elements via the DAS protocol. For instance the DAS command to get all the GERP elements on the BRCA2 gene (Human chr 13: 32889611-32973347) is:
http://www.ensembl.org/das/Homo_sapiens.GRCh37.constrained_element/features?segment=13:32889611,32973347

By default, you obtain both the constrained derived from our 16-way amniote alignments and the 33-way placental mammals ones (these include all the low-coverage genomes). You can filter the elements you want by using the argument type:
http://www.ensembl.org/das/Homo_sapiens.GRCh37.constrained_element/features?segment=13:32889611,32973347;type=33_eutherian_mammals

Read more on DAS or on the multiple alignments and constrained elements.

25 May, 2010

Ensembl Events in June 2010

In June we will have the following Ensembl events:

2-4 June: Ensembl module in the EBI Bioinformatics Roadshow at Charles University, Prague, Czech Republic
10 June: Browser workshop at Gothenburg University, Sweden
11 June: Browser workshop at Gothenburg University, Sweden (satellite of the European Human Genetics Conference 2010)
12-15 June: Presentation at the European Human Genetics Conference 2010
14-18 June: Ensembl module in the Joint EBI - Wellcome Trust Bioinformatics Summer School, Hinxton, UK
22-23 June: Browser workshops at the German Cancer Research Center (DKFZ), Heidelberg, Germany
23 June: Browser workshop at Trinity College Dublin, Ireland
25 June: Browser workshop at the Ludwig-Maximilians University, Munich, Germany
30 June - 1 July: Browser workshop at the University of Ljubljana, Slovenia

For details about these and other upcoming events, please have a look at the complete list of Ensembl training events.

24 May, 2010

Which whole-genome multiple aligner? Pecan!

Comparing and assessing the quality of whole-genome multiple alignments is a difficult task. In the protein world, many mathematical models are available. They are based on synonymous and non-synonymous substitutions and the physicochemical similarities among the aminoacids. None of this can be applied to non-coding sequences, the 99% of the human genome.

There are two main trends for whole-genome alignments. Authors have either used genomic features like ancestral repeats or have developed phylogenetic models to generate synthetic sequences for which the "real" alignment is known.

Two articles have been published recently, one proposing a new method based on artificial sequences (Kim & Sinha, BMC Bioinformatics 2010, 11:54) and the other one looking at the coverage, agreement and accuracy of the alignments in the ENCODE pilot regions (Chen & Tompa, Nature Biotechnology 2010, doi:10.1038/nbt.1637).

According to both studies, Pecan is the strongest contender, showing the clear advantage of using a consistency-based approach (see Paten et al., Genome Res. 2008, 18:1814-28) to align the sequences.

21 May, 2010

Musing about Fish Alignments.

Release 57 saw the release of a 5-way EPO alignment across the telost fish - Zebrafish, Stickleback, Medaka, Tetraodon and Fugu. Just recently I've spent some time browsing through them. They are very interesting, with the ancestral duplication in Fish showing more complex homology relationships than in mammals. Here's a nice, clean example

Simple Fish multiple alignments

Here is a far more complex region, when ENREDO has clearly picked up the ancestral duplication but is struggling to make it colinear across the entire region

Complex Fish multiple alignments

One thing which I don't think most people appreciate is the incredible phylogenetic depth in the telost linage. In terms of "millions of years of evolution" or "sequence divergence" actually the deepest splits in the telosts - such as ZebraFish to Stickleback - are almost as deep as telosts to mammals - certainly deeper than birds to mammals. So this is asking alot to find good, clearly co linear stretches, in particular when you think of the draft nature of these genomes.

Chatting to Javier, it might be much better to also look at a 4-way EPO on the "Stickleback" side of the telost linage, in other words, Medaka/Stickleback/Fugu/Tetraodon. This I think will come together better (in fact most of the "nice" regions in the 5-way EPO are actually regions without ZebraFish) and we might be able to look at taking that ancestral chromosome ordering and perhaps sequence in comparisons to Zebrafish.

14 May, 2010

Ensembl, Xfam and HMMs

This week, Albert Vilella and myself participated in the Xfam consortium meeting. The meeting focussed on protein, domains and ncRNA classification, and on the new developments of the HMMER package.

Although Ensembl is not part of Xfam, we share many interests. We are getting increasingly interested in the use HMMER models, especially since the release of HMMER3.0. Also, in the forthcoming release (version 58), Ensembl will provide gene trees for ncRNAs. Most of these ncRNA genes are annotated using Rfam models.

Stay tuned for more!

11 May, 2010

Neandertal Genome Browser

In collaboration with the Neandertal Genome Project, we have created an Ensembl-style browser of the Neandertal data available at http://projects.ensembl.org/neandertal. A draft sequence of the Neandertal genome was published in the May 7 issue of Science.

The Neandertal browser includes the ability to visualise the Neandertal data using the new Resembl code developed in collaboration with Illumina. The Resembl code will be introduced in the 1000 Genomes browser later this month and in Ensembl over the summer.

Data include:
- Neandertal sequencing reads from all 6 Neandertal fossils
- Neandertal contigs/consensus from all individuals combined
- Modern human sequencing reads to put the divergence of the Neandertal genomes into perspective
- Selective sweep scan to detect positive selection in early modern humans
- A catalog of changes consisting of Neandertal alleles for positions of non-synonymous difference between human and chimpanzee

Full details of the data types and instructions for using our new display tools are available on the data information page.

Links are also provided from the Neandertal Browser home page to the raw sequence data stored at the EBI for the Neandertal genome project and the modern human genome data.


Further information about the project is available from the project page at Max Planck Institute for Evolutionary Anthropology, from the genome paper and from other companion papers in the same issue of science.

We thank Janet Kelso, Ed Green and Udo Stenzel at the MPI for assistance and Eugene Kulesha at the EBI for work to create the Neandertal browser.

06 May, 2010

SNPedia in Ensembl


Ensembl is always extending the variation pages to include more information. Did you know that the latest data from SNPedia is now available?

SNPedia is a wiki-style resource for human genetics with public annotation of over 11,000 SNPs, released under a Creative Commons style license. We have integrated it into Ensembl, so you can view these SNP reports along with our other information including variations, genotype and allele frequencies from dbSNP, and SNPs from other sources including UniProt, Affymetrix and Illumina chipsets and phenotype annotations from several genome-wide association studies.

You need to configure the page to view SNPedia. From the variation page, e.g. rs1333049, click on "Configure this page" and then click on "External Data" to select SNPedia to appear in the left hand side menu of all variation pages via DAS. As this information comes directly from SNPedia via DAS it is always up-to-date.