Difference between revisions of "Using the topoview Glyph"

From GMOD
Jump to: navigation, search
(Accessory scripts)
Line 1: Line 1:
 
==topoview==                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
 
==topoview==                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
 +
[[image:topoview_sample.png|thumb|left|500px|Sample topoview track]]
 
"topoview.pm" the TopoView glyph was developed for fast                                                                                                                                                                                                                                                                                                                                                                                                                                       
 
"topoview.pm" the TopoView glyph was developed for fast                                                                                                                                                                                                                                                                                                                                                                                                                                       
 
3D-like demonstration of RNA-seq data consisting of multiple                                                                                                                                                                                                                                                                                                                                                                                                                                 
 
3D-like demonstration of RNA-seq data consisting of multiple                                                                                                                                                                                                                                                                                                                                                                                                                                 

Revision as of 16:38, 28 March 2015

topoview

Sample topoview track

"topoview.pm" the TopoView glyph was developed for fast 3D-like demonstration of RNA-seq data consisting of multiple individual subsets. The main purposes were to compact presentation as much as possible (in one reasonably sized track) and to allow easy visual detection of coordinated behavior of the expression profiles of different subsets. See the note below about normalizing the expression profiles across the whole experiment.

This glyph is derived from the fb_shmiggle Glyph, 2009-2010 Victor Strelets, FlyBase.org

Log transformation

It was found that log2 conversion dramatically changes perception of expression profiles and kind of illuminates coordinated behavior of different subsets. Using log2-transformed read counts is recommended (see below for instructions).

Data format and performance

Comparing performance (retrieval of several Kbp of data profiles for several subsets of some RNA-seq experiment) of wiggle binary method and of several possible alternatives, it was discovered that one of the approaches remarkably outperforms wiggle bin method (although it requires several times more space for formatted data storage). Optimal storage/retrieval method stores all experiment data (all subsets of the experiment) in one text file, where structure of the file in fact is one of the most simple wiggle (coverage files) formats with the addition of some positioning data (two-column format, without runlength specification, without omission of zero values). This is the only format which glyph is able to handle.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
# subset =BS107_all_unique chromosome =2LHet                                                                                                                                                                                                                                                                                                                                                                                                                                                 
-200000 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
0       0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
19955   1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
19959   0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
19967   2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
19972   0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
19977   2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
20027   0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
20031   2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
20035   0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
20043   1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    

Accessory scripts

The Bio::Graphics package has two scripts useful for processing BAM alignment data from programs such as tophat for use with this glyph.

bam_coverage_windows.pl accepts a sorted bam file as input and will calculate the average read coverage for a user-specified window size (default 25). The windows are non-overlapping. The output format is WIG/BED4, which is the format used by the coverage_to_topoview.pl script.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
Usage: perl bam_coverage_windows.pl -b bamfile -n 10_000_000 -w 25 | gzip -c > bamfile.wig.gz                                                                                                                                                                                                                                                                                                                                                                                                
    -b name of bam file to read REQUIRED                                                                                                                                                                                                                                                                                                                                                                                                                                                     
    -w window size (default 25)                                                                                                                                                                                                                                                                                                                                                                                                                                                              
    -n normalized read number -- if you will be comparing multiple bam files                                                                                                                                                                                                                                                                                                                                                                                                                 
                                 select the read number to normalize against.                                                                                                                                                                                                                                                                                                                                                                                                                
                                 All counts will be adjusted by a factor of:                                                                                                                                                                                                                                                                                                                                                                                                                 
                                 actual readnum/normalized read num                                                                                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

Note: If you are comparing BAM files with different total read counrts, you need to normalize the read counts for each BAM file with the -n option. The number used in -n should be near the avarage read count for all BAM files being analyzed in the experiment.

coverage_to_topoview.pl converts a list of coverage files (WIG/BED4) to the indexed format used by this glyph.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
Usage: perl coverage_to_topoview.pl [-o output_dir] [-h] [-l] file1.wig.gz file2.wig.gz                                                                                                                                                                                                                                                                                                                                                                                                      
    -o output directory (default 'topoview')                                                                                                                                                                                                                                                                                                                                                                                                                                                 
    -l use log2 for read counts (recommended)                                                                                                                                                                                                                                                                                                                                                                                                                                                
    -h this help message                                                                                                                                                                                                                                                                                                                                                                                                                                                                     

Note 1: Each WIG file corresponds to a single BAM file and is a subset (subtrack). The base name of the file will be used a the subset. For example, the file:
shoots-R1.wig.gz will generate a subset names shoots-R1 in the topoview track.

Note 2: If you do not specify an output directory name, the default name 'topoview' will be used. Any existing contents will be overwritten. For example, if you are making two tracks, one using raw counts and the other using log2 transformed counts:

perl coverage_to_topoview.pl -o raw file1.wig.gz file2.wig.gz
perl coverage_to_topoview.pl -o log2 -l  file1.wig.gz file2.wig.gz

..will yield:

├── log2
│   ├── data.cat
│   └── index.bdbhash
└── raw
    ├── data.cat
    └── index.bdbhash

Configuration

Example config stanza

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
[TOPOVIEWLOG2]                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
database      = scaffolds                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
feature       = region                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
glyph         = topoview                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
autoscale     = local                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
height        = 200                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
datadir       = /home/ubuntu/data/bam/log2                                                                                                                                                                                                                                                                                                                                                                                                                                                   
subset order  = SRR1810778.25  FF9966                                                                                                                                                                                                                                                                                                                                                                                                                                                        
                SRR1810779.25  FF6633                                                                                                                                                                                                                                                                                                                                                                                                                                                        
                SRR1810780.25  FF0000                                                                                                                                                                                                                                                                                                                                                                                                                                                        
                SRR1810781.25  00CC66                                                                                                                                                                                                                                                                                                                                                                                                                                                        
                SRR1810782.25  009933                                                                                                                                                                                                                                                                                                                                                                                                                                                        
                SRR1810783.25  006600                                                                                                                                                                                                                                                                                                                                                                                                                                                        
key           = TopHat: Normalized Read Coverage (log2)                                                                                                                                                                                                                                                                                                                                                                                                                                      
show max      = 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
x_step        = 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
y_step        = 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
fill opacity  = 0.8                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

Options

Glyph-specific options

  • feature The full-length feature for the track. This would usuallu be the feature type you configured for your chromosomes or scaffolds
  • database The same database as you used for the chromosomes
  • autoscale options are 'local' and 'global'. local scales to the on-screen max value, global scales to the global max
  • datadir location of the indexed coverage data (absolute path)
  • show max show an extra subset corresponding to the maximum coverage across all subsets
  • x_step the horizontal offset (pixels) of each plotted subset (can not be zero)
  • y_step the vertical offset (pixels) of each plotted subset (can not be zero)
  • fill opacity the degree of transparency of each plotted subset. Translucency aids in the comparison.
  • subset order the order and color of each subset (subplot) in the graph (see below)

Subsets

Setting 'subset order' is mandatory. It specified the subsets and the order in which they will be displayed.

There are three ways to represent the subsets:

Ordered subsets with no color specified. Random colors will be assigned. Hope you are feeling lucky.

                                                                                                                                                                                                                                         
subset order  = SRR1810778.25                                                                                                                                                                                                                 
                ...                                                                                                                                                                                                                           

Ordered subsets with color specified (only use hex colors with the '#' omitted)

                                                                                                                                                                                                                                         
subset order  = SRR1810778.25  FF9966                                                                                                                                                                                                         
                SRR1810779.25  FF6633                                                                                                                                                                                                         
                ...                                                                                                                                                                                                                           

Ordered subsets with color and opacity set. Not that the global 'fill opacity' option affects all subsets. Specifying individual opacity is optional.

                                                                                                                                                                                                                                         
subset order  = SRR1810778.25  FF9966 0.8                                                                                                                                                                                                     
                SRR1810779.25  FF6633 0.7                                                                                                                                                                                                     
                ...