24 May, 2010

Which whole-genome multiple aligner? Pecan!

Comparing and assessing the quality of whole-genome multiple alignments is a difficult task. In the protein world, many mathematical models are available. They are based on synonymous and non-synonymous substitutions and the physicochemical similarities among the aminoacids. None of this can be applied to non-coding sequences, the 99% of the human genome.

There are two main trends for whole-genome alignments. Authors have either used genomic features like ancestral repeats or have developed phylogenetic models to generate synthetic sequences for which the "real" alignment is known.

Two articles have been published recently, one proposing a new method based on artificial sequences (Kim & Sinha, BMC Bioinformatics 2010, 11:54) and the other one looking at the coverage, agreement and accuracy of the alignments in the ENCODE pilot regions (Chen & Tompa, Nature Biotechnology 2010, doi:10.1038/nbt.1637).

According to both studies, Pecan is the strongest contender, showing the clear advantage of using a consistency-based approach (see Paten et al., Genome Res. 2008, 18:1814-28) to align the sequences.

No comments: