28 January, 2011

New genebuild summaries now available

We are pleased to announce new documentation, specific for describing the gene annotation methodology and results for particular species.

Ensembl gene annotation is a multi-step process which usually takes several months to complete for one species, and is termed the genebuild. In order to provide our users with more information on the data resources used and decisions made during the genebuilding process, we are introducing a new genebuild summary PDF document for each new genebuild, starting from early February 2011 with Ensembl release 61. Each document includes details on not only the alignment programs and data filtering parameters used, but also statistics on the number of protein/cDNA/EST sequences used at different stages of the genebuild. For example, users will be able to find out how many protein sequences were retrieved from public repositories (RefSeq and UniProt) at the beginning of the genebuilding process, how many of these proteins aligned to the genome by various algorithms at different stages of the build, and how many remain in the final gene set as supporting evidence for genes. For human, mouse and zebrafish, the process of merging Ensembl and Havana annotations is also explained.

The genebuild summary will be available for six species: the Anole lizard, Marmoset, Mouse, Panda, Turkey and Zebrafish. More genebuild summaries will be available in the future when genebuilds of existing species are being updated, or when new species are being annotated. You can download the document via a link found near the bottom of the "Description" page for each species. Just click on the species of interest from the home page, to open its description page.

18 January, 2011

Ensembl Events in February 2011

These are the Ensembl events for February:

10 Feb: Developers workshop at the Korea Genome Organization (KOGO) 2011 Winter Symposium, YongPyong Ski Resort, South Korea
11 Feb: Browser workshop at the Zentrum fuer Humangenetik und Laboratoriumsmedizin Dr. Klein und Dr. Rost, Martinsried, Germany
14 Feb: Developers workshop at Seoul National University, Seoul, South Korea
16 Feb: Developers workshop at the Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, South Korea
17-18 Feb: Browser workshop at the University of Cambridge, Cambridge, UK
23-25 Feb: Ensembl module in the Bioinformatics Roadshow at the Research Centre for Biodiversity and Genetic Resources (CIBIO), Porto, Portugal

For details about these and other upcoming events, please have a look at the complete list of Ensembl training events.

New Search - Lucene


Ensembl is in the process of moving its site search to the open source Apache Lucene framework. This change should bring several advantages, not only to us, but to all users, the main one being added flexibility; in the short term it will have little impact on web site users, except for making life easier to those maintaining local instances.

From Ensembl release 62 (due out this spring) we will incorporate more data into the search (for example help and documentation) and start to improve how we display results. For developers, note that whilst we are not releasing the webcode for Lucene immediately, we are aiming to do so for release 62.

This powerful platform allows searching of over 3 million genes and gene symbols, over 6 million oligo probes, and over 67 million variations! Our implementation utilises software designed and developed by our colleagues at the European Bioinformatics Institute (used in the EB-eye) which has proven to be fast and flexible.

Lucene is open-source technology that has also been implemented to
provide searches of our mailing lists (i.e. announce and dev), thanks to our colleagues at the Wellcome Trust Sanger Institute.

We hope these improvements will help make browsing Ensembl a more user-friendly experience. Please give your feedback at helpdesk@ensembl.org.

13 January, 2011

New Ensembl mirror in Asia

We are pleased to announce the public availability of an Ensembl mirror in Asia. It can be found at http://asia.ensembl.org/ . This provides a fully functional Ensembl website, but there are some things to note which I've listed below.

Redirection

We don't automatically redirect users to the new mirror, although we have plans for this in future. So for now you'll need to explicitly visit http://asia.ensembl.org/ to access it.

User logins

If you use the login functionality, your existing login will work on http://asia.ensembl.org/ , and configuration changes will be shared between sites.

Other services

We don't yet offer the Biomart or BLAST/BLAT services on the new mirror; these will come in the near future. We currently have no plans to offer an Aisa-based MySQL mirror, so you should continue use ensembldb.ensembl.org for MySQL queries.

We're very keen to hear your experiences with this new mirror; please use the Helpdesk in the first instance, or contact me directly.

12 January, 2011

Survey Feedback - Thank You

Many thanks to the Ensembl browser users who have given us feedback in our recent survey entitled "Tell Us What You Think"! We learned some valuable points that are being addressed to improve our discoverability, functionality, and overall usability.

We heard back from scientists all over the world- the majority of you were in the UK, Netherlands, the US, and Germany. Represented fields include bioinformatics, basic research, clinical and genetics research, biotechnology and immunology. 50% of respondants work mainly with the computer, while the other half of you do at least some wet-lab biology. We even got responses from mainly wet-lab scientists (15% of respondants)- this is useful to us, as we strive to make Ensembl usable to the largest possible community.

So what did we learn? The use of BioMart and the Perl API by website-users has increased since our last survey a year and a half ago. We have more infrequent users, visiting our browser monthly or less often- though the majority of our users are Ensembl masters (frequent users). We believe that this represents the fact that an ever greater percentage of biological research involves at least some bioinformatics tools and hope this reflects a simpler, more straightforward website that does not need extensive study to use. Finally, 65% of our users take a genome-wide approach, while 20% focus on less than 10 genes.

So what did people like? Our tools are popular, especially the Variant Effect Predictor. The recent addition of sortable columns is also a hit. When you all were asked what other tools are desired, we were pleased to find that some (history) were already being implemented, while others exist, but seem to be hidden. On that note-

Those of you who asked for a record of recent actions in Ensembl, if you login (registration is free) a history of recent genes, transcripts, variations and locations you visit will appear in the tabs. Give it a try!

Many of you asked for tools and functionality that exist, such as CpG islands, (available as a track in Location view) a map of gene structure for all isoforms of a gene, and SyntenyView. To aid in the discoverability of these tools, our main search will be configured to also yield results from help pages. This should help people find what they're looking for, without relying on browsing alone. Also, we will make more use of this blog by posting "Did You Know?" tips that will help you learn about functionalities of Ensembl and BioMart that may not be completely transparent. The archive (older) sites in particular don't appear to be easy to find (the link is a small one, at the bottom of each Ensembl page), we address this in our FAQ section.

As for other requests for functionality we don't yet have, these are being taken on board, and will hopefully lead to exciting new developments in the future.

Thanks again for your feedback!

The Ensembl Team