27 February, 2009

Ensembl 54

We are already working on our next release (out late in April 2009) which will come with the following:

Data

Zebrafish
We will be releasing a new genebuild for zebrafish (with updated repeat masking) based on the latest assembly Zv8. Thus, we'll have a new gene set (with new probeset mappings).

Horse
A gene patch (fixing split genes) based on human/mouse 1:1 orthologues. Therefore we have a new gene set.

Human

  • cDNA update
  • New ensembl-vega merge delivering a "new gene set".
Mouse
  • cDNA update
  • New ensembl-vega comparison, delivering a "new gene set" .
New gene sets (ncRNA genes) for several low coverage genomes:
Sloth (Choloepus hoffmanni), armadillo (Dasypus novemcinctus), kangaroo rat (Dipodomys ordii), elephant (Loxodonta africana), hyrax (Procavia capensis), megabat (Pteropus vampyrus), tarsier (Tarsius syrichta), dolphin (Tursiops truncatus) and alpaca (Vicugna pacos).

Mart
  • New functional genomics mart
Core
Minor schema changes

  • cDNA update
  • Update versions (patch_53_54_a.sql)
  • Increase size of oligo_probe.name (patch_53_54_b.sql)
  • Increase size of external_db.db_name (patch_53_54_c.sql)
  • Move analysis_id from identity_xref to object_xref (patch_53_54_d.sql)
  • Increase size of analysis.logic_name (patch_53_54_e.sql)

Variation and Functional Genomics

  • Schema change to source table to add description column for web display
  • Updated zebafish database
  • Import Illumina data whenever available
  • Recalculate consequence type for mouse regulatory feature
  • eFG array mapping: Human, Mouse, Rat, Drosophila
  • Affymetrix (UTR/IVT + ST), Illumina (WG)
New mouse DNAse data to support the first Mouse RegulatoryBuild

Code Other

  • Amazon EC2 public datasets updated
  • New GO database (ensembl_ontology_54) and API
  • Changing default behaviour of TranscriptAdaptor
  • Translation attribs modified
  • Remove entries with spaces from species.classification
  • Gene name and xref projections

Pairwise alignments

Update the pairwise alignments for zebrafish (Danio rerio):

  • human-zebrafish translated BLAT-NET
  • mouse-zebrafish translated BLAT-NET
  • rat-zebrafish translated BLAT-NET
  • chicken-zebrafish translated BLAT-NET
  • frog-zebrafish translated BLAT-NET
  • tetraodon-zebrafish translated BLAT-NET
  • fugu-zebrafish translated BLAT-NET
  • medaka-zebrafish translated BLAT-NET
  • stickleback-zebrafish translated BLAT-NET
  • Ciona savignyi-zebrafish translated BLAT-NET
  • Ciona intestinalis-zebrafish translated BLAT-NET
Add new alignments for medaka:
  • human-medaka BLASTZ-NET (imported from UCSC)
  • mouse-medaka BLASTZ-NET (imported from UCSC)

The following files will be available for download:

  • EMF dumps for GeneTrees
  • EMF dumps for EPO and PECAN multiple alignments
  • BED files for 31 way GERP constrained elements
  • BED files for 12 way GERP constrained elements
Homologies and families
  • 49-way GeneTrees and Homologies, with new/updated gene sets and assemblies.
  • Multiple Sequence Alignments with consistency-based MCoffee
  • Meta-aligner (mafftgins+muscle+kalign+probcons).
  • Pairwise gene-based dN/dS calculations for high coverage species pairs.
  • Updated MCL families including all Ensembl AS isoforms and latest UniProt Metazoa.
  • Multiple Sequence Alignments with MAFFT


2 comments:

Xianjun said...
This comment has been removed by the author.
Giulietta said...

Keep in mind, these are updates coming in April, 2009! Just this week, we'll have the Ensembl 53 release going live:

http://ensembl.blogspot.com/2009/01/ensembl-53.html

Also, species with new assemblies that are undergoing a genebuild, like zebrafish (zv8 assembly), are available in our pre! site:

http://pre.ensembl.org/index.html