02 June, 2010

Zebrafish RNA-seq gene models

We have been developing a pipeline to build gene models using only RNA-seq data. For release 58 we have added a preliminary set of Zebrafish RNA-seq gene models with an intention to integrate this new source of evidence into a full genebuild soon.

Zebrafish transcriptome data from 9 tissues were used to build a set of genes and splice variants. For each loci we chose the variant with the highest read support to display, further details on the process are available here.
To display the genes, go to the Region in Detail, or Region Overview. Use the "Configure this page" button and select "RNASeq Genes" from the "Genes" menu. The "Supporting DNA Alignments" menu contains supporting exon and intron features from each of the nine tissues. Clicking on these features in Ensembl location pages shows a simple read count for the intron features and RPKM values for transcripts and exons, (reads per kilobase of model per million mapped reads, from Mortazavi Nature Methods 2008).

This is a first attempt at visualising tissue specific read depth and alternative splicing, which we hope to develop further in the future.

5 comments:

Vayuputra said...
This comment has been removed by a blog administrator.
Vayuputra said...

I find this track to be very useful. Can someone comment on the caveats in using this data and/or important points to remember while interpreting this data? How can we extract this information - details of individual exon RPKMs and transcript RPKMs (into a spreadsheet)?

Simon White (Ensembl Genebuild ) said...

The RPKM data is not available in biomart at the moment. Normally the best way to access this would be via the Ensembl API, specifically you would want the core API

However, I have already pulled this data out and made a gtf file containing the transcripts and RPKM values (another used had a similar request).
The files are available here.
Hope you find them useful.

We are currently in the process of making a similar gene set on the new Zv9 assembly, this should be available later in the autumn.

Vayuputra said...

Thanks Simon. The .gtf files are quite helpful. Are new files available from the Zv9 data? I am especially interested in this because a new stage (14 dpf) seems to have been added to the previous version of RNA-Seq data.

Anil

Simon White (Ensembl Genebuild ) said...

I have made a new .gtf file with RPKM values for the Zv9 RNASeq gene set. The file is available here.