28 May, 2009

We are curious as to how our users are finding the current Ensembl web browser. We opened a survey in the current Ensembl (version 54). The survey will close in one week's time (Friday, 5 June).

For users who have not yet replied to the survey, we ask that you spare 10 minutes or so of your time to do so. Please give us your thoughts and feedback by clicking on the link below:


The feedback centers on the web browser, specifically the new interface launched in Nov, 2008.

21 May, 2009

Ensembl Events in June 2009

For June we have the following Ensembl events:

4-5 June : Browser workshop at the University of Cambridge, Cambridge, UK
11 June: Demo for the National Genetics Reference Lab, Manchester, UK
15-16 June: Browser workshop at the Facultad de Ciencias de la Universidad de Los Andes, Mérida, Venezuela
18-19 June: Developers workshop at the Facultad de Ciencias de la Universidad de Los Andes, Mérida, Venezuela
22 June: Browser workshop for NHS Molecular Genetics Laboratories at Liverpool Women's Hospital, Liverpool, UK
24-26 June: Ensembl module in the Bioinformatics for Vascular Biology course at the EBI, Hinxton, UK
29 June: Browser workshop at the Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany

For details about these and other upcoming events, please have a look at the complete list of Ensembl training events.

15 May, 2009

Pre, Archive, and Vega downtime

Next week, 18-24 May, an upgrade is scheduled that will affect the following sites:

Archive sites versions 48-53 (Dec 2007-May 2009) (Downtime will start Monday 18 May)

BLAST and User Upload on Pre! sites (Wednesday 20 May)

BLAST and User Upload on the Vega site

14 May, 2009

Ensembl 55

We are currently working on our next release which is due at the end of June 2009 and will contain the following:


Human GRCh 37
We will be releasing a new genebuild for human based on the lastest assembly GRCh37 from the Genome Reference Consortium. A preliminary version of this assembly is available now in Ensembl Pre! Due to the new assembly we will have:
  • Updated repeat masking
  • New probeset mappings
  • cDNA update
  • A new ensembl-vega merge delivering a new gene set
Ensembl 55 includes the 2X genome for Tammar Wallaby (Macropus eugenii), this will be a projection build similar to our other 2X species.

C. elegans
We will also include an import of the WormBase release WS200 database for C. elegans.

Anole lizard - A gene patch incorporating the gene set provided by Chris Ponting at Oxford University means that we have a new gene set for the green anole lizard (Anolis Carolinensis).

Mouse - The mouse cDNA alignments have been updated.

Zebrafinch - There will be an updated gene set for the 6X zebra finch genome.

Zebrafish - Non-coding RNAs will be added to the Zv8 zebrafish assembly and there will also be some changes to protein coding gene models and new repeats and expression patterns.


Schema Changes
  • Patch to update versions (patch_54_55_a.sql). * Add the missing types to go_xref (patch_54_55_b.sql).
  • Add new table dependent_xref (will hold the dependencys for the xrefs, i.e. if an EMBL entry come from a uniprot entry this relationship will be in the table)( patch_54_55_d.sql).
  • Add new tables for alternative splicing/transcript events (patch_54_55_c.sql).
  • Add new column 'is_constitutive' to the exon table (patch_54_55_e.sql)

Xrefs will be run for Human, Macacca, Opossum, Chimp, Chicken, Dog and Mouse (including Fantom Xrefs).

Ontology database schema and tools
The ensembl_go_NN databases are no longer being built. Instead we are replacing this with the ensembl_ontology_NN database which may be connected to using the core API.

Assembly mapping
Some of the databases will contain mapping coordinates between current and previous assemblies:
  • human: mapping from current GRCh37 to NCBI36, NCBI35 and NCBI34
  • mouse: mapping from current NCBIM37 to NCBIM36, NCBIM35 and NCBIM34
Other changes
  • API support for alternative transcripts/splicing events will be added
  • API support for constitutive exons will be added
  • Deprecated API modules will be removed
  • All slices will be created using the new_fast method from the SliceAdaptor to improve performance
  • seq_region seq edit support will be added. Seq_edits can already be stored and retrieved but these were not used in getting the sequence data. This will be changed so that "_rna_edit" attributes in the seq_region_attrib table will be used and the sequence changed.
  • MySQL and FASTA dumps will be copied to Amazon Public Datasets project
  • Gene name and xref projections

  • New functional genomics mart * A new Probe section added to Ensembl mart
  • New ontology mart
  • Constitutive exon information will be re-added to Ensembl mart

  • There will be a new human variation database generated by mapping NCBI36 coordinates to GRCh37 (using dbSNP 129)
  • Illumina array data for SNP/CNV is to be added
  • Transcript variations for Zebrafish and Zebrafinch will be reculated to include information from the new gene sets
  • Schema change - added a call to get consequence_type
Functional genomics
  • Human Regulatory Build will be updated using the GRCh37 assembly
  • Probe alignment and transcript annotation for all species will migrate from the core datbases to the functional genomics databases, this includes Affymetrix, Illumina, Codelink and Phalanx
  • Schema change, an is_current filed is to be added to the coord_system table
Comparative genomics

Alignments - The new human assembly means that the following alignments will be regenerated:
  • 9 eutherian mammals EPO multiple alignments
  • 31 eutherian mammals EPO multiple alignments
  • 12 amniota vertbrates Pecan multiple alignments
  • 4 catarrhini primate EPO multiple alignments
  • Pairwise BLASTZ-NET alignments of human against each of the other 9 and 31 eutherian mammals
  • Additional pairwise BLASTZ-NET alignments will be run for human-opossum, human-platypus, human- chicken and human-wallaby
  • Translated BLAT-NET will be regenerated for human against fugu, X.tropicalis, C.intestinalis, C.savignyi, stickleback, medaka, chicken, zebrafish, tetraodon, zebrafinch and anole lizard

Synteny will be recalculated for: rat vs. huamn, chicken vs. human and human vs. macaque, dog, chimpanzee, platypus, opossum, mouse, orangutan, horse and cow

Homologies amd families
  • 50 way GeneTrees and homologies with new/updated genebuilds and assemblies
  • Clustering using hcluster_sg
  • Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins + muscle + kalign + probcons) and new exon-skipping aware "skipper" algorithm.
  • New 'putative gene split' and 'distant paralog' homology types
  • Pairwise gene-based dN/dS calculations for high coverage species pairs
  • Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa
  • Multiple sequence alignments with MAFFT
  • Stable IDs for GeneTrees (ENSGT00550NNNNNNNNN) and MCL Families (ENSFM00550NNNNNNNNN).

12 May, 2009

SNPs and ancestral alleles

The new Ensembl release includes a new view for SNPs and other genomic variations. It shows the alignment of the polymorphic position together with 10 base pairs of sequence up- and downstream. The user can choose among all available multiple alignments. Polymorphic positions in the other species are also shown.

This is very useful for looking at ancestral alleles, especially in combination with our EPO alignments as they include the inferred ancestral sequence. Although dbSNP provide predicted ancestral alleles for human SNPs, these are based on the chimp sequence only. In several cases, the ancestral sequence inferred from the multiple alignment is in disagreement with the chimp sequence like in this example. Using multiple alignments gives better results and more confidence to the calls.

08 May, 2009

Release 54 and pre.ensembl.org

The Ensembl project is pleased to announce release 54 of Ensembl. Highlights of this release are:

  • New Zv8 zebrafish assembly;
  • Comparative alignment text displays for variations and regions;
  • Ability to add personal notes to any Gene or Transcript.
For more information visit:
Along side this release we are also releasing a new version of the pre site. This now includes:

05 May, 2009

The eFG Array Mapping Environment

The Ensembl Functional Genomics (eFG) environment has been expanded to incorporate array mapping functionality. Historically, arrays from different vendors have been processed in similar, but non-identical ways due to differing array designs, with the output being stored in the core database. The 'arrays' environment unifies this process within the eFG database to provide a new standardised array mapping procedure for all array formats. This involves a two step process whereby probe sequences are aligned both to genomic and transcript sequences, and then subsequently transcripts are annotated with xrefs(DBEntries) dependant on the quality of the probe alignments around a given transcript locus.

The 'arrays' environment provides easily accessible and interactive command line functions to help run and administer the array mapping pipeline. Recent developments include broader array format support and multi-species capability, along with capture of much more detailed mapping information. This data has yet to be seen in the Ensembl browser, but from release 55 we will start redirecting the web displays to use the eFG data, with a view to developing a more detailed 'Probe' panel at some point later in the year.

We will endeavour to provide alignments and mappings of all popular arrays, for all others we invite you to try out the eFG 'arrays' environment. For more information check out(literally):


Or see it online here.

If you have any questions, please mail ensembl-dev@ebi.ac.uk

02 May, 2009

Browser Training in Vienna on 22 May

The Ensembl Genome Browser project is pleased to announce a workshop on 22 May as a satellite meeting of the European Human Genetics Conference in Vienna, Austria. This full-day workshop is aimed at geneticists and life scientists, and will explore genes, variations, and comparative information using the browser's new interface released Nov, 2008. An introduction to large-scale data retrieval with BioMart will be included. We will also feature brief introductions into the European Genotype Archive (EGA) and the 1000 Genomes Project. The format of our browser workshops are described on our outreach page.

The course on 22 May is held at a central location- the Vienna University Computer Service.

The workshop is free, however limited places are available. Please register if you will be attending.