16 November, 2010

Beds, Wigs and BAMs

The Ensembl 60 release sees two changes in our data upload capabilities

First off Ensembl can now "attach" a BAM file. BAM is the compressed form SAM - Sequence AlignMent files - which has become the dominant way to package up next-generation sequencing data. A BAM (or SAM file) has both the sequence and the alignment of a set of reads in a compact form (BAM makes it even more compact). Critically you can index a BAM file allowing programs rapid access to particular "Slices" of the reads by genomic position. Alignment tools such as Maq, BWA, SOAP can produce BAM files; a variety of analysis tools are written around BAM files, and now Ensembl can view BAM files.

To make a BAM file viewable you need to have access to a website where you can put files (like you local web space, perhaps an institutional thing). Call MyGreatExperiment.bam. You then need to index the BAM file using one of the tools - samtools is the usual one to do this, making a MyGreatExperiment.bam.bai (BAM index) precisely along side it (The Ensembl code is going to make the assumption that the index is called filename.bai). Then go to "Manage your Data" button on any web page in Ensembl, and go to the "Attach BAM" section. And then browse your RNA-seq, Chip-seq, Exome data to your hearts content!

In addition, we've spruced up our functionality and documentation on the UCSC file formats of Bed, BedGraph and Wig. Take a look at the "File Upload" and "Attach URL" forms, and the documentation. Now we precisely indicate what attributes you can use in each of these formats. Our goal is to make Ensembl as useful as possible to as broad a set of users as possible, so let us know if you find something confusing and/or you have a Bed/Bedgraph/Wig file that works for UCSC but doesn't work on Ensembl.

This is of course available across all 50 species in Ensembl, and in a couple of weeks, when Ensembl Genomes 7 is out, across another 50 eukaryotes from protists to plants and about 250 different bacteria.

Comments are welcome - either on this blog or email to helpdesk@ensembl.org


Malcolm Cook said...

very nice! Thanks.

?? Any possibility for handling files protected by with web authentication

Manage Data gives me option to "share" the bam file I just "attached", but then I'm told "you have not data to share". Huh?

The track seems to get auto-turned off if/when the amt of data overwhelms the browser. This could perhaps (1) be made more obvious when it happens (2) be reverted if/when the user zooms to region which is more suitable for displaying the density of data in the tracek....

Ewan Birney said...

Malcom - thanks for your comments.

Not sure about web authentication on upload - one for Steve and Anne, and it sounds like the "share" the BAM is a bug.

As for large data amounts, again one for Steve Searle - but I think this is harder than you think as it's only once your into the data that do you realise that there is too much :)

Anne Parker said...

Hi Malcolm

We can certainly look at adding the ability to handle authentication - we are always open to feature requests from users :)

Misha Kapushesky said...

It's always useful to have links to the mentioned help pages.

This is the link for the Upload Help: http://www.ensembl.org/info/website/upload/index.html

and here are some WIG examples: http://www.ensembl.org/info/website/upload/wig.html

Also quite useful - if you want to see how to link to Ensembl and attach your data at the same time (see the "Customising via links" section):