21 July, 2009

West Coast Mirror

This year we've invested in our own mirror - maintained by us - on the west coast of the US. This was mainly because assessing the web return time for our users showed a consistent additional 3 to 4 seconds if you were lucky enough to live out on the west coast (worse still if you are in Australia!). Although we did alot last year to improve the general response time of our web pages (for example, compressing our CSS and Javascript down to single files for the whole site, so these are only loaded once and then cach'ed locally), the Ensembl site delivers alot of dynamic content - and nothing but getting closer to the users can help this.

You can reach the site directly at uswest.ensembl.org or alternatively there is a little "world" icon on the top right of the page which switches to the star-and-stripes when you're on the west coast. Having the mirror not only helps our users who are on the west coast but also provides resilience when our main site goes down. As we're responsibile for provisioning it in-sync with our main site (its part of our release process) this mirror will stay current with the main site.

In some sense the mirror should be a low cost "per user" for us having the mirror - if users go to the mirror, it means less load on the main site, and so it's really how we distribute the "web farm" that sits behind Ensembl geographically. However, there are overheads from hiring rack space in the US to making our own release cycle more complex. This means we will need to assess whether running a US mirror makes sense in the long term. Our instinct is yes, but we need hard data on this.

These things need time to pick up, but already we'd be interested in feedback on this - for US users, is this site faster for you - in particular for East coast people who we think are probably still best off on the main site. Does it change with time of day? For Pacific rim users - Japan, Singapore, Korea, Australia - is the west coast site snappier for you? We'll be putting in place our own monitoring schemes, but user feedback is always good...

20 July, 2009

Ensembl browser workshops Western US January 2010

The Ensembl Outreach team is currently looking into the possibility of giving some 1-day Ensembl browser workshops in the Western US in the period directly after the Plant and Animal Genome Conference in San Diego, which takes place from 9-13 January 2010. Hosting institutions would only have to pay the instructor's expenses (accommodation and subsistence) and would share the domestic travel costs, but we would not otherwise charge for the workshops.
For more information about our browser workshops, please have a look here.
Interested? Please contact me for more details at bert@ebi.ac.uk.

15 July, 2009

Release 55

The Ensembl project is pleased to announce release 55 of Ensembl. Highlights of this release are:

* New GRCh37 human assembly;
* Wallaby 2x genome;
* Ability to display uploaded data on individual chromosomes or whole karyotype.

For more information visit our What's New page.

14 July, 2009

Ensembl Regulatory Features - now Martable

Release 55 has lots of goodies - not least the new, coordinated, GRCh37 assembly (more on that later), but one addition is the Martability of Ensembl Regulatory Features. Regulatory features are on by default on Human and Mouse, and each gene has a specific page for the regulatory features (for example http://www.ensembl.org/Homo_sapiens/Gene/Regulation?g=ENSG00000139618). Regulatory Features are developing fast, and the Martability is bringing out the richer information in the functional genomics database - for example, the classification of features into "promoter", "gene associated" and "unclassified". Next release we're hoping to release a more graphical view for each feature, but the present of the regulatory features in Mart allows the large scale users - from Perl, Java, R or just plain-only tab delimited text - to use them.

We're expecting alot of development in this area - the addition of Mouse DNaseI sites has allowed us to develop a Mouse build, and of course, the ENCODE project which is now on line in production mode will provide a far richer, deeper, dataset to work against.

So - watch this space.

08 July, 2009

More about the world of Ensembl

Who is using us?

During the long flight from London to Seoul, where I'm now giving workshops, I had time to do some analysis.

We have recently changed the way we analyse traffic on our website. Urchin Software allows us to pinpoint access to our site with more accuracy. (In a previous post, I showed data where some domains were excluded in the representation).

This month's data (June 2009) is more comprehensive and can be shown with the following heatmap. Again, dark-coloured countries show more use than lightly shaded ones. We can now detect some low level traffic in the African continent missed in previous analyses.

Following a recent thread, I normalised these data taking into account population. (Did I mention how long the flight from London to Seoul actually is?) This puts the Netherlands as the country with the highest ratio of Ensembl users normalised by population, followed by Sweden, Switzerland, the United Kingdom, Finland, Belgium, Iceland, Germany, Denmark and Singapore (the first non European country in the list) pushing the United States (excluding traffic from the US mirror) to the 15th position.

On another matter, I'm glad to see that our user base in this country is well established. Apart from the obvious Seoul, we can also see access from Taejon, Kwangju, Taegu, Pusan, Inchon, Pohang and up to 29 locations in the country, and following these workshops we hope this will increase further.

As they say over here 안녕


02 July, 2009

Go, go, go .... the new Ensembl ontology database and API

Starting with release 55 of Ensembl we provide an ensembl_ontology database. It replaces the older ensembl_go database which used to be loaded straight from the public table dumps provided by the Gene Ontology group (and hence wasn't really an Ensembl database to start with). The associated API is now part of the Ensembl Core API, which should make working with GO terms in Ensembl more straightforward than it was in the past. Available methods include, amongst others, fetching all parent or child terms of a given GO term and fetching all genes, transcripts or translations annotated with a given GO term.

More detailed documentation on both database and API can be found at ensembl/misc-scripts/ontology/README.

Credit for developing the ensembl_ontology database and API goes to Andreas Kahari of the Ensembl Software team.