Difference between revisions of "User talk:RobertBuels"

From GMOD
Jump to: navigation, search
Line 170: Line 170:
  
 
The code for working with this tiled image format in JBrowse 1.3 is in <code>TiledImageStore/Fixed.js</code>.
 
The code for working with this tiled image format in JBrowse 1.3 is in <code>TiledImageStore/Fixed.js</code>.
 
= NEW PAGE [[JBrowse Troubleshooting]] =
 
 
This page collects solutions to problems that people sometimes encounter when installing JBrowse.
 
 
== Installing prerequisites ==
 
 
=== Linux - Ubuntu / Debian ===
 
 
These commands, or similar, should install what you need:
 
 
  sudo apt-get install build-essential libpng-dev zlib1g-dev libgd2-xpm-dev
 
 
=== Installing prerequisites under Linux - Red Hat / Fedora / CentOS ===
 
 
These commands, or similar, should install what you need:
 
 
  sudo yum groupinstall "Development Tools"
 
  sudo yum install libpng-devel gd-devel zlib-devel perl-ExtUtils-MakeMaker
 
 
=== Mac OS X ===
 
 
Use [http://www.macports.org/ MacPorts], [http://www.finkproject.org/ Fink], [http://mxcl.github.com/homebrew/ Homebrew], or another package manager to install a C++ compiler, libpng development headers, GD development headers, and Zlib development headers.
 
 
== Failures of <code>setup.sh</code> ==
 
 
<code>setup.sh</code> creates a log file of debugging information associated with your installation.  Email this to '''entire file''' to [mailto:gmod-ajax@lists.sourceforge.net gmod-ajax@lists.sourceforge.net] with a request for support.
 
 
As more users try <code>setup.sh</code> and report problems to the mailing list, this wiki will be updated with fixes for common problems they encounter.
 
 
= NEW PAGE [[JBrowse Configuration Guide]] =
 
 
Setting up [[JBrowse]] consists of placing a copy of the jbrowse directory somewhere in the web-servable part of your server's file system, and then running several server-side scripts to create a directory containing JSON-format data files that JBrowse uses to display data.  Both the JBrowse code and these data files must be in a location where the web server can present them to clients.  Then, a user pointing their web browser at the appropriate URL will see the JBrowse interface, including sequence and feature tracks reflecting the data source.
 
 
There is a particular order to follow when adding data to JBrowse: reference sequence data should be added first (using <tt>prepare-refseqs.pl</tt>`), followed by annotation data. Once all of annotation data has been added, it is possible to make the names of each feature searchable. While there is some flexibility in this order of operations (it is possible to add additional reference sequences after feature tracks have been added, for example), the first step will always be to specify a sequence or set of sequences, and the last step will always be to make the named features searchable (assuming it is desired that all feature names are searchable).
 
 
=User Interface=
 
 
[[File:JBrowseUI.png|800px|center|thumb|
 
'''1. Location Marker:''' Click and drag to move to a different genomic position.<br />
 
'''2. Hidden Tracks:''' Drag a track to this area to hide it.<br />
 
'''3. Window Slider:''' Resize the viewing field.<br />
 
'''4. Scroll Buttons:''' Click to scroll by a fixed amount at a given zoom level.<br />
 
'''5. Viewing Field:''' Drag a track to this area to make it visible. Depending on the track, some zooming may be necessary.<br />
 
'''6. Zoom Buttons:''' Click to zoom. Per click, the larger buttons zoom more than the smaller buttons.<br />
 
'''7. Chromosome Selector:''' Choose which chromosome to view.<br />
 
'''8. Search Bar:''' Browse to a certain region by searching for a location or feature name.<br />
 
]]
 
 
=Reference Sequences=
 
 
The reference sequences are the sequences whose annotations the browser will view, and which therefore provide a co-ordinate system for all other tracks. At a close enough zoom level, the sequence data itself is visible as a special track; this track is hidden once the individual sequence characters become too small to distinguish.
 
 
The exact interpretation of "reference sequence" will depend on how you are using JBrowse; but for a model organism genome database, each reference sequence would typically represent a chromosome (in a perfect assembly) or at least a [http://en.wikipedia.org/wiki/Contig contig]. Before any feature or image tracks can be input to JBrowse, the reference sequence must be taken into consideration. This is handled by the prepare-refseqs.pl script.
 
 
==prepare-refseqs.pl==
 
 
This script is used to input sequence data into JBrowse, and must be run prior to the addition of feature tracks or image tracks. The simplest way to use it is with the --fasta option, which uses a single sequence or set of reference sequences from a [[Glossary#FASTA|FASTA]] file:
 
 
bin/prepare-refseqs.pl --fasta <fasta file> [options]
 
 
If the file has multiple sequences (e.g. multiple chromosomes), each sequence will become a reference sequence by default. You may switch between these sequences by selecting the sequence of interest via the pull-down menu to the right of the large "zoom in" button.
 
 
You may use any alphabet you wish for your sequences (i.e., you are not restricted to the nucleotides A, T, C, and G; any alphanumeric character, as well as several other characters, may be used). Hence, it is possible to browse RNA and protein in addition to DNA. However, some characters should be avoided, because they will cause the sequence to "split" - part of the sequence will be cut off and and continue on the next line. These characters are the ''hyphen'' and ''question mark''. Unfortunately, this prevents the use of hyphens to represent gaps in a reference sequence.
 
 
In addition to reading from a fasta file, prepare-refseqs.pl can read sequences from a gff file or a database. In order to read fasta sequences from a database, a config file must be used.
 
 
Syntax used to import sequences from gff files:
 
bin/prepare-refseqs.pl --gff <gff file with sequence information> [options]
 
 
Syntax used to import sequences with a config file:
 
bin/prepare-refseqs.pl --conf <config file that references a database with sequence information> --[refs|refid] <reference sequences> [options]
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| fasta, gff, or conf
 
| Path to the file that JBrowse will use to import sequences. With the fasta and gff options, the sequence information is imported directly from the specified file. With the conf option, the specified config file includes the details necessary to access a database that contains the sequence information. Exactly one of these three options must be used.
 
|-
 
| out
 
| A path to the output directory (default is 'data' in the current directory)
 
|-
 
| seqdir
 
| The directory where the reference sequences are stored (default: <output directory>/seq)
 
|-
 
| noseq
 
| Causes no reference sequence track to be created. This is useful for reducing disk usage.
 
|-
 
| refs
 
| A comma-delimited list of the names of sequences to be imported as reference sequences. This option (or refid) is required when using the conf option. It is not required when the fasta or gff options are used, but it can be useful with these options, since it can be used to select which sequences JBrowse will import.
 
|-
 
| refids
 
| A comma-delimited list of the database identifiers of sequences to be imported as reference sequences. This option is useful when working with a [[Chado]] database that contains data from multiple different species, and those species have at least one chromosome with the same name (e.g. chrX). In this case, the desired chromosome cannot be uniquely identified by name, so it is instead identified by ID. This ID can be found in the 'feature_id' column of 'feature' table in a Chado database.
 
|}
 
 
=Feature Tracks=
 
 
Feature tracks can be used to visualize localized annotations on a sequence, such as gene models, transcript alignments, SNPs and so forth. JBrowse has several different ways of importing annotation data into feature tracks:
 
 
* [[#flatfile-to-json.pl|flatfile-to-json.pl]]
 
* [[#bam-to-json.pl|bam-to-json.pl]]
 
* [[#biodb-to-json.pl|biodb-to-json.pl]]
 
* [[#ucsc-to-json.pl|ucsc-to-json.pl]]
 
 
==flatfile-to-json.pl==
 
 
This script inputs a single track into JBrowse. To put multiple tracks into JBrowse, it must be executed repeatedly.
 
 
Terminology: A ''flat file'' is a database that exists entirely in a single file. For this script, the flat file must be a [[GFF3]], [[GFF2]], or [http://www.ensembl.org/info/website/upload/bed.html BED] file.
 
 
Basic syntax:
 
bin/flatfile-to-json.pl --[gff|gff2|bed] <flat file> --tracklabel <track name> [options]
 
 
Hint: flatfile-to-json.pl simplifies the process of inputting a small number of tracks into JBrowse, since it does not use a config file. If you have many tracks, you will probably want to use a config file, because its structure will make the task of editing tracks easier. In that case, the appropriate script will be [[#biodb-to-json.pl|biodb-to-json.pl]].
 
 
[[File:Flatfile-options.png|600px|thumb|center|flatfile-to-json.pl options.]]
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| gff, gff2, or bed
 
| The name of the file that contains the feature data. The names of these options correspond to the file types, with the exception of gff, which uses a [[GFF3]] file instead of a [[GFF]] file. Exactly one of these three options must be used.
 
|-
 
| tracklabel
 
| The internal name that JBrowse will give to this feature track. This option requires a value.
 
|-
 
| key
 
| The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
 
|-
 
| autocomplete
 
| Dictates what the features of the track will be searchable by after running [[#generate-names.pl|generate-names.pl]]. This option can be used with the arguments "label", "alias", "all", or "none". By default, "none" is used.
 
*'''label:''' Make the features searchable by the viewable name that they are associated with in JBrowse. In a gff3 file, this will be the "Name" in the attributes column.
 
*'''alias:''' Make the features searchable by an alternate name defined in the input file. In a gff3 file, this will be the "Alias" in the attributes column.
 
*'''all:''' Make the features searchable by both their labels and their aliases.
 
*'''none:''' Make the features searchable by neither their labels nor their aliases.
 
|-
 
| out
 
| A path to the output directory (default is 'data' in the current directory).
 
|-
 
| [[JBrowseDev/The CssClass Option|cssClass]]
 
| The css class that will be used to create the feature track. This option makes it possible to choose how the feature track will look by selecting a template class from genome.css. The default css class is 'feature'.
 
|-
 
| getType
 
| Causes the 'type' to be included in the output JSON file. The type is the feature that has been predicted (e.g. promoter site, gene). If a gff file is being used, the type will be in column 3.
 
|-
 
| getPhase
 
| Causes the 'phase' to be included in the output JSON file. The phase describes the reading frame of a DNA (or messenger RNA) sequence. If the phase is relevant, it can have the values 0, 1, or 2; otherwise, the value associated with the phase is '.'. If a gff file is being used, the phase will be in column 8.
 
|-
 
| getSubs
 
| Causes subfeature data to be included in the output JSON file.
 
|-
 
| getLabel
 
| Causes the 'Name' attribute associated with each feature to be included the output JSON file. This will cause a textual name to appear below the features in the track. If a gff3 file is being used, the 'Name' attribute will be in column 9 when it is defined.
 
|-
 
| [[JBrowseDev/The UrlTemplate Option|urlTemplate]]
 
| A url that your browser will visit when you click on a feature in this track. This is especially useful if you want to link a feature to a page with more information about that feature.
 
|-
 
| arrowheadClass
 
| When this option is used, directional features will be given an arrowhead. The presence and orientation of the arrowhead for each individual feature will depend on data in the input file. Arrowhead classes are defined in genome.css. There is only one that comes with JBrowse (transcript-arrowhead).
 
|-
 
| [[JBrowseDev/The SubfeatureClasses Option|subfeatureClasses]]
 
| The css class(es) that will be used for the subfeatures of a feature track. This option makes it possible to choose how the subfeatures will appear. Any of the classes in genome.css can be used for the subfeatures. This option must be used with getSubs in order for subfeatures to appear.
 
|-
 
| [[JBrowseDev/The ClientConfig Option|clientConfig]]
 
| Any visual additions or edits for the main features of the track (not for subfeatures). These edits must be specified in [[Glossary#JSON|JSON]] syntax.
 
|-
 
| type
 
| The type of feature that will appear in the feature track. This option is useful when the input file contains features of several different types, and you are interested in only having one type of feature (e.g. only having features that are genes) in the feature track. In gff3 files, the type is in the third column.
 
|-
 
| [[JBrowseDev/The ExtraData Option|extraData]]
 
| Use additional information from the input file to create variations in the appearance or behavior of individual features. This option is meant to be used in conjunction with other options. For each feature in the track, a perl subroutine is used to extract additional information, which is then associated with a variable. The value of this variable can be different for each feature. When the name of this variable is surrounded by curly braces and used in the argument for a different option, such as urlTemplate, the feature-specific data is used.
 
|-
 
| nclChunk
 
| The NCList chunk size. This option should not be used unless an error such as "json or perl structure exceeds maximum nesting level" is encountered. If this error does occur, lower the chunk size (the default is 50000).
 
|}
 
 
==bam-to-json.pl==
 
 
This script inputs a track into JBrowse using a [[Glossary#BAM|BAM]] file. Tracks added with this script are similar in appearance to tracks added by [[#flatfile-to-json.pl|flatfile-to-json.pl]].
 
 
Special dependencies: [[Glossary#SAMtools|SAMtools]], Bio::DB::SAM
 
 
Basic syntax:
 
bin/bam-to-json.pl --bam <bam file> --tracklabel <track name> [options]
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| bam
 
| The name of the bam file that contains the feature data. This option requires a value.
 
|-
 
| tracklabel
 
| The internal name that JBrowse will give to this feature track. This option requires a value.
 
|-
 
| key
 
| The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
 
|-
 
| out
 
| A path to the output directory (default is 'data' in the current directory).
 
|-
 
| [[JBrowseDev/The CssClass Option|cssClass]]
 
| The css class that will be used to create the feature track. This option makes it possible to choose how the feature track will look by selecting a template class from genome.css. The default css class is 'feature'.
 
|-
 
| [[JBrowseDev/The ClientConfig Option|clientConfig]]
 
| Any visual additions or edits for the main features of the track (not for subfeatures). These edits must be specified in [[Glossary#JSON|JSON]] syntax.
 
|-
 
| nclChunk
 
| The NCList chunk size in bytes. This option should not be used unless an error such as "json or perl structure exceeds maximum nesting level" is encountered. If this error does occur, lower the chunk size (the default is 50000 bytes).
 
|-
 
| compress
 
| This option causes the output JSON files for the track (trackData.json and hist-*.json) to be compressed with gzip.
 
|}
 
 
==biodb-to-json.pl==
 
 
This script uses a [[JBrowseDev/Current/Usage/ConfigFiles|config file]] to produce a set of feature tracks in JBrowse. It can be used to obtain information from any database with appropriate [[Glossary#Database Schema|schema]], or from flat files. Because it can produce several feature tracks in a single execution, it is useful for large-scale feature data entry into JBrowse.
 
 
Basic syntax:
 
bin/biodb-to-json.pl --conf <config file> [options]
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| conf
 
| The name of the JSON configuration file that will be used. This option must be specified.
 
|-
 
| out
 
| A path to the output directory (default is 'data' in the current directory).
 
|-
 
| track
 
| The identifier of a single track that will be updated or added to JBrowse. In the list of key-value pairs comprising an individual track definition in the config file, the identifier will be the value associated with "track".
 
|-
 
| ref
 
| A comma-delimited list of reference sequence names, used to limit database queries to a subset of JBrowse reference sequences. By default, the database is queried for all reference sequences in JBrowse.
 
|-
 
| refid
 
| A comma-delimited list of reference sequence IDs from a [[Chado]] database, used to limit database queries to a subset of JBrowse reference sequences. By default, the database is queried for all reference sequences in JBrowse.
 
|-
 
| compress
 
| This option causes the output JSON files for the track (trackData.json and hist-*.json) to be compressed with gzip.
 
|}
 
 
==ucsc-to-json.pl==
 
 
This script uses data from UCSC genome annotation database. To reach this data, go to [http://hgdownload.cse.ucsc.edu/downloads.html hgdownload.cse.ucsc.edu] and click the link for the genome of interest. Next, click the "Annotation Database" link. The data relevant to ucsc-to-json.pl (*.sql and *.txt.gz files) can be downloaded from either this page or the FTP server described on this page.
 
 
Together, a *.sql and *.txt.gz pair of files (such as cytoBandIdeo.txt.gz and cytoBandIdeo.sql) constitute a database table. Ucsc-to-json.pl uses the *.sql file to get the column labels, and it uses the *.txt.gz file to get the data for each row of the table. For the example pair of files above, the name of the database table is "cytoBandIdeo". This will become the name of the JBrowse track that is produced from the data in the table.
 
 
In addition to all of the feature-containing tables that you want to use as JBrowse tracks, you will also need to download the trackDb.sql and trackDb.txt.gz files for the organism of interest.
 
 
Basic syntax:
 
bin/ucsc-to-json.pl --in <directory with files from UCSC> --track <database table name> [options]
 
 
Hint: If you're using this approach, it might be convenient to also download the sequence(s) from UCSC. These are usually available from the "Data set by chromosome" link for the particular genome or from the FTP server.
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| in
 
| A directory containing all of the *.sql and *.txt.gz data from UCSC. This directory ''must'' contain the trackDb.sql and trackDb.txt.gz files for the organism of interest, as well as all of the feature-containing tables that you wish to use as JBrowse tracks.
 
|-
 
| track
 
| The name of the database table. If you leave off the .sql or .txt.gz extensions of the table files you wish to use, you will have this value.
 
|-
 
| out
 
| A path to the output directory (default is 'data' in the current directory).
 
|-
 
| [[JBrowseDev/The CssClass Option|cssClass]]
 
| The css class that will be used to create the feature track. This option makes it possible to choose how the feature track will look by selecting a template class from genome.css. The default css class is 'feature'.
 
|-
 
| arrowheadClass
 
| When this option is used, directional features will be given an arrowhead. The presence and orientation of the arrowhead for each individual feature will depend on data in the input file. Arrowhead classes are defined in genome.css. There is only one that comes with JBrowse (transcript-arrowhead).
 
|-
 
| [[JBrowseDev/The SubfeatureClasses Option|subfeatureClasses]]
 
| The css class(es) that will be used for the subfeatures of a feature track. This option makes it possible to choose how the subfeatures will appear. Any of the classes in genome.css can be used for the subfeatures.
 
|-
 
| [[JBrowseDev/The ClientConfig Option|clientConfig]]
 
| Any visual additions or edits for the main features of the track (not for subfeatures). These edits must be specified in [[Glossary#JSON|JSON]] syntax.
 
|-
 
| nclChunk
 
| The NCList chunk size in bytes. This option should not be used unless an error such as "json or perl structure exceeds maximum nesting level" is encountered. If this error does occur, lower the chunk size (the default is 50000 bytes).
 
|-
 
| compress
 
| This option causes some of the output JSON files (trackData.json and hist-*.json) to be compressed with gzip.
 
|-
 
| sortMem
 
| The maximum amount of RAM (in bytes) to use for sorting the features. The default value is 536870912 bytes (512MiB).
 
|}
 
 
=Image Tracks=
 
 
As well as feature tracks, JBrowse allows generic "image tracks". These currently include "quantitative tracks" (which are pixel-resolution histograms) and "basepair tracks" (an experimental track type that shows base-pair arcs in RNA structure, and is intended to demonstrate the Perl API for rendering your own tracks from data).
 
 
* [[#wig-to-json.pl|wig-to-json.pl]]
 
* [[#draw-basepair-track.pl|draw-basepair-track.pl]]
 
 
==wig-to-json.pl==
 
 
Using a [http://genome.ucsc.edu/goldenPath/help/wiggle.html WIG] file, this script inputs a single wiggle track into JBrowse. In a wiggle track, a numeric value is associated with each nucleotide position in the reference sequence. This is represented in JBrowse as a track that looks like a histogram, where the horizontal axis is for each nucleotide position, and the vertical axis is for the number associated with that position. The vertical axis currently does not have a scale; rather, the heights for each position are relative to each other.
 
 
Special dependencies: [http://www.libpng.org/pub/png/libpng.html libpng]
 
 
In order to use wig-to-json.pl, the code for wig2png must be compiled. This can be done with the following command:
 
 
make
 
 
'''Note:''' If you are using Mac OS X, it might be necessary to execute 'make' in the following way:
 
 
make GCC_LIB_ARGS=-L/usr/X11/lib GCC_INC_ARGS=-I/usr/X11/include
 
 
Basic syntax:
 
bin/wig-to-json.pl --wig <wig file> --tracklabel <track name> [options]
 
 
Hint: If you are using this type of track to plot a measure of a prediction's quality, where the range of possible quality scores is from some lowerbound to some upperbound (for instance, between 0 and 1), you can specify these bounds with the max and min options.
 
 
[[File:Wiggle-options.png|600px|center|thumb|Summary of wig-to-json.pl options.]]
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| wig
 
| The name of the wig file that will be used. This option must be specified.
 
|-
 
| tracklabel
 
| The internal name that JBrowse will give to this feature track. This option requires a value.
 
|-
 
| key
 
| The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
 
|-
 
| out
 
| A path to the output directory (default is 'data' in the current directory).
 
|-
 
| tile
 
| The directory where the tiles, or images corresponding to each zoom level of the track, are stored. Defaults to data/tiles.
 
|-
 
| bgcolor
 
| The color of the track background. Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "255,255,255".
 
|-
 
| fgcolor
 
| The color of the track foreground (i.e. the vertical bars of the wiggle track). Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "105,155,111".
 
|-
 
| width
 
| The width in pixels of each tile. The default value is 2000.
 
|-
 
| height
 
| The height in pixels of each tile. Changing this parameter will cause a corresponding change in the top-to-bottom height of the track in JBrowse. The default value is 100.
 
|-
 
| min
 
| The lowerbound to use for the track. By default, this is the lowest value in the wiggle file.
 
|-
 
| max
 
| The upperbound to use for the track. By default, this will be the highest value in the wiggle file.
 
|}
 
 
 
==draw-basepair-track.pl==
 
 
This script inputs a single base pairing track into JBrowse. A base pairing track is a distinctive track type that represents base pairing between nucleotides as arcs.
 
 
Basic syntax:
 
bin/draw-basepair-track.pl --gff <gff file> --tracklabel <track name> [options]
 
 
[[File:Basepair-options.png|600px|center|thumb|Summary of draw-basepair-track.pl options.]]
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| gff
 
| The name of the gff file that will be used. This option must be specified.
 
|-
 
| tracklabel
 
| The internal name that JBrowse will give to this feature track. This option requires a value.
 
|-
 
| key
 
| The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
 
|-
 
| out
 
| A path to the output directory (default is 'data' in the current directory).
 
|-
 
| tile
 
| The directory where the tiles, or images corresponding to each zoom level of the track, are stored. Defaults to data/tiles.
 
|-
 
| bgcolor
 
| The color of the track background. Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "255,255,255".
 
|-
 
| fgcolor
 
| The color of the track foreground (i.e. the base pairing arcs). Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "0,255,0".
 
|-
 
| width
 
| The width in pixels of each tile. The default value is 2000.
 
|-
 
| height
 
| The height in pixels of each tile. Changing this parameter will cause a corresponding change in the top-to-bottom height of the track in JBrowse. The default value is 100.
 
|-
 
| thickness
 
| The thickness of the base pairing arcs in the track. The default value is 2.
 
|-
 
| nolinks
 
| Disables use of file system links to compress duplicate image files.
 
|}
 
 
 
=Searchable Names=
 
 
==generate-names.pl==
 
 
This script makes it possible to search for features by ''label'' (the visible name below a feature in JBrowse) and/or by ''alias'' (a secondary name that is not visible in the web browser, but may be present in the JSON used by the JBrowse client). For tracks that are added using [[#flatfile-to-json.pl|flatfile-to-json.pl]] or [[#biodb-to-json.pl|biodb-to-json.pl]], searchability depends on how the 'autocomplete' option is used. If a track is input with the autocomplete option set to 'alias', for instance, features will be searchable by alias after generate-names.pl is run (provided that alias names are present in the original data source). For tracks added using [[#ucsc-to-json.pl|ucsc-to-json.pl]], features will be searchable by label after running generate-names.pl.
 
 
To search for a term, use the text box at the top of the JBrowse window.
 
 
Basic syntax:
 
bin/generate-names.pl [options]
 
 
Note that generate-names.pl does not require any arguments. However, some options are available:
 
 
{| class="wikitable"
 
|-
 
! Option
 
! Value
 
|-
 
| dir
 
| A path to the output directory (default is 'data/names' in the current directory).
 
|-
 
| thresh
 
| A lower-bound on the Patricia trie chunk size. Specifically, the lowest possible chunk size is (thresh + 1). The default value is 200. In this context, a chunk is a group of connected Patricia trie nodes that can be visualized as a single entity, and the chunk size is the total number of genomic features contained in a chunk. The lower the value of thresh, the more chunks there will be.
 
|-
 
| verbose
 
| This setting causes information about the division of nodes into chunks to be printed to the screen.
 
|}
 
 
=Removing Tracks=
 
 
Although JBrowse does not support a script that removes individual tracks, there are a number of possible options that can be used to change or remove a track:
 
 
'''1. Overwrite the unwanted track with a new track.''' This is useful when a mistake was made in preparing a track, and you are interested in removing the track only so that you can replace it with a correct track that has the same tracklabel (the 'tracklabel' is a track's internal name). To overwrite a track, use new information with the same tracklabel value.
 
 
'''2. Remove the entire data directory.''' This is useful when you want to completely remove a track (i.e., remove a tracklabel). This is the easiest and fastest way to remove a track, if replacing all of the correct tracks is trivially simple.
 
 
'''3. Remove the information about the specific tracks from the data directory.''' This is the most precise option, but it is not always the easiest or best option. In order to delete a specific track while leaving all other tracks intact, it will be necessary to find the entry for the track in the 'data/trackInfo.js' file and remove it. The entry will be enclosed in curly braces, and will contain the track's label and key, in addition to other data. Remove the entire entry, including the curly braces. To adhere to [[Glossary#JSON|JSON]] syntax, be sure to also remove any trailing commas. This will completely remove the track from view in JBrowse. However, the track's data will still be present. In order to remove it, remove the 'data/tracks/<reference sequence name>/<tracklabel>/' directory. You might also want to remove the 'data/names' directory and run generate-names.pl again, if any features from the track you removed were searchable.
 
 
=URL Control=
 
 
JBrowse provides a number of option for changing the current view in the browser by adding options to the URL which potentially contain genomic location components.
 
 
Basic syntax:
 
http://<server>/<path to jbrowse>?loc=<location string>&tracks=<tracks to show>
 
 
==loc==
 
Parameters represent the current genomic position which will be visible in the viewing field. Possible input structures are:
 
 
'''"Chromosome"+":"+ start point + ".." + end point'''
 
 
A chromosome name/ID followed by “:”, starting position, “..” and end position of the genome to be viewed in the browser is used as an input. Chromosome ID can be either a string or a mix of string and numbers. “CHR” to indicate chromosome may or may not be used. Strings are not case-sensitive. If the  chromosome ID is found in the database reference sequence (RefSeq), the chromosome will be shown from the starting position to the end position given in URL.
 
  example) ctgA:100..200
 
Chromosome ctgA will be displayed from position 100 to 200.
 
 
OR '''start point + ".." + end point'''
 
 
A string of numerical value, “..” and another numerical value is given with the loc option. JBrowse navigates through the currently selected chromosome from the first numerical value, start point, to the second numerical value, end point.
 
  example) 200..600
 
 
OR '''center base'''
 
 
If only one numerical value is given as an input, JBrowse treats the input as the center position. Then an arbitrary region of the currently selected gene is displayed in the viewing field with the given input position as the center base.
 
 
  example) 200
 
 
OR '''feature name/ID'''
 
 
If a string or a mix of string and numbers are entered as an input, JBrowser treats the input as a feature name/ID of a gene. If the ID exists in the database RefSeq, JBrowser displays an arbitrary region of the feature from the the position 0, starting position of the gene, to a certain end point.
 
 
  example) ctgA
 
 
== tracks ==
 
parameters are comma-delimited strings containing track names, each of which should correspond to the "label" element of the track information dictionaries that are currently viewed in the viewing field.
 
Names for the tracks can be found in data/trackInfo.js in jbrowse-1.2.1 folder.
 
 
  example) DNA,knownGene,ccdsGene,snp131,pgWatson,simpleRepeat
 
 
=See also=
 
 
* [[JBrowseDev/Current/Installation | Installation notes for Mac OS X and Linux]]
 
* [[JBrowseDev/Current/Usage/Database | Using a database with JBrowse]]
 
* [[JBrowseDev/Current/Usage/ConfigFiles | Using configuration files]]
 
* [[JBrowse Tutorial]]
 
 
=External Links=
 
 
* [http://genome.cshlp.org/content/19/9/1630.full JBrowse: A Next Generation Genome Browser]
 
* [http://jbrowse.org/code/jbrowse-master/docs/ Documentation from the JBrowse Package]
 
* [http://biowiki.org/view/JBrowse/QuickTutorial Quick JBrowse Tutorial from BioWiki]
 

Revision as of 21:22, 13 April 2012

Rob working on JBrowse docs

This is essentially a branch of the docs. Having to do this is making me wonder whether a wiki is really the best place for them.

NEW PAGE Template:JBrowseResourcesBoxItem

Resources

Home Page
Tutorial
Blog
Configuration
Demo
Mailing Lists

NEW PAGE JBrowse Advanced Topics

Data Format Specification: Lazy Nested Containment List (LazyNCList) Feature Store

JBrowse uses lazily-loaded nested containment lists (LazyNCLists) as an efficient format for storing feature data in pre-generated static files. A nested containment list is a tree data structure in which the nodes of the tree are intervals themselves features, and edges connecting features that lie within the bounds of (but are not subfeatures of) another feature. It has some similarities to an R tree. For more on NClists, see the Alekseyenko paper.

This data format is currently used in JBrowse 1.3 for tracks of type FeatureTrack, and the code that actually reads this format is in SeqFeatureStore/NCList.js and ArrayRepr.js.

The LazyNCList format can be broken down into two distinct subformats: the LazyNCList itself, and the array-based JSON representation of the features themselves.

Array Representation (ArrayRepr)

For speed and memory efficiency, JBrowse feature JSON represents features as arrays instead of objects. This is because the JSON representation is much more compact (saving a lot of disk space), and many browsers significantly optimize JavaScript Array objects over more general objects.

Each feature is represented as an array of the form [ class, data, data, ... ], where the class is an integer index into the store's classes array (more on that in the next section). Each of the elements in the classes array is an array representation that defines the meaning of each of the the elements in the feature array.

An array representation specification is encoded in JSON as (comments added):

{
  "attributes": [                   // array of attribute names for this representation
     "AttributeNameForIndex1",
     "AttributeNameForIndex2",
     ...
  ],
  "isArrayAttr": {                  // list of which attributes are themselves arrays
     "AttributeNameForIndexN": 1,
     ...
  }
}

Lazy Nested-Containment Lists (LazyNCList)

A JBrowse LazyNCList is a nested containment list tree structure stored as one JSON file that contains the root node of the tree, plus zero or more "lazy" JSON files that contain subtrees of the main tree. These subtree files are lazily fetched: that is, they are only fetched by JBrowse when they are needed to display a certain genomic region.

On disk, the files in an LazyNCList feature store look like this:

 # stats, metadata, and nclist root node
 data/tracks/<track_label>/<refseq_name>/trackData.json
 # lazily-loaded nclist subtrees
 data/tracks/<track_label>/<refseq_name>/lf-<chunk_number>.json
 # precalculated feature densities
 data/tracks/<track_label>/<refseq_name>/hist-<bin_size>.json
 ...

Where the trackData.json file is formatted as (comments added):

{
   "featureCount" : 4293,          // total number of features in this store
   "histograms" : {                // information about precalculated feature-frequency histograms
      "meta" : [
         {                         // description of each available bin-size for precalculated feature frequencies
            "basesPerBin" : "100000",
            "arrayParams" : {
               "length" : 904,
               "chunkSize" : 10000,
               "urlTemplate" : "hist-100000-{Chunk}.json"
            }
         },
         ...                       // and so on for each bin size
      ],
      "stats" : [
         {                           // stats about each precalculated set of binned feature frequencies
           "basesPerBin" : "100000", // bin size in bp  
           "max" : 51,               // max features per bin
           "mean" : 4.93030973451327 // mean features per bin
         },
         ...
      ]
   },
   "intervals" : {
      "classes" : [                // classes: array representations used in this feature data (see ArrayRepr section above)
         {
            "isArrayAttr" : {
               "Subfeatures" : 1
            },
            "attributes" : [
               "Start",
               "End",
               "Strand",
               "Source",
               "Phase",
               "Type",
               "Id",
               "Name",
               "Subfeatures"
            ]
         },
         ...
         {                        // the last arrayrepr class is the "lazyClass": fake features that point to other files
            "isArrayAttr" : {
               "Sublist" : 1
            },
            "attributes" : [
               "Start",
               "End",
               "Chunk"
            ]
         }
      ],
      "nclist" : [
         [
            2,                    // arrayrepr class 2
            12962,                // "Start" minimum coord of features in this subtree
            221730,               // "End"   maximum coord of features in this subtree
            1                     // "Chunk" (indicates this subtree is in lf-1.json)
         ],
         [
            2,                    // arrayrepr class 2
            220579,               // "Start" minimum coord of features in this subtree
            454457,               // "End"   maximum coord of features in this subtree
            2                     // "Chunk" (indicates this subtree is in lf-2.json)
         ],
         ...
      ],
      "lazyClass" : 2,            // index of arrayrepr class that points to a subtree
      "maxEnd" : 90303842,               // maximum coordinate of features in this store
      "urlTemplate" : "lf-{Chunk}.json", // format for lazily-fetched subtree files
      "minStart" : 12962                 // minimum coordinate of features in this store
   },
   "formatVersion" : 1
}

Data Format Specification: Fixed-Resolution Tiled Image Store

JBrowse can display tracks composed of precalculated image tiles, stretching the tile images horizontally when necessary. The JBrowse Volvox example data has a wiggle data track that is converted to image tiles using the included wig2png program, but any sort of image tiles can be displayed if they are laid out in this format.

The files for a tiled image track are structured by default like this:

  data/tracks/<track_label>/<refseq_name>/trackData.json
  data/tracks/<track_label>/<refseq_name>/<zoom_level_urlPrefix>/<index>.png
  ... (and so on, for many more PNG image files)

Where the PNG files are the image tiles themselves, and trackData.json contains metadata about the track in JSON format, including available zoom levels, the width and height of the image tiles, their base resolution (number of reference sequence base pairs per image tile), and statistics about the data (such as the global minimum and maximum of wiggle data).

The structure of the trackData.json file is:

{
  "tileWidth": 2000,            // width of all image tiles, in pixels
  "stats" : {                   // any statistics about the data being represented
      "global_min": 100,
      "global_max": 899
   },
  "zoomLevels" : [              // array describing what resolution levels are available
     {                          // in the precalculated image tiles
        "urlPrefix" : "1/",
        "height" : 100,
        "basesPerTile" : 2000
     },
     ... (and so on, for zoom levels in order of decreasing resolution / increasing bases per tile )
  ]
}


To see a working example of this in action, see the contents of sample_data/json/volvox/tracks/volvox_microarray.wig/ctgA after the Volvox wiggle sample data has been formatted.

The code for working with this tiled image format in JBrowse 1.3 is in TiledImageStore/Fixed.js.