Difference between revisions of "GBrowse"

From GMOD
Jump to: navigation, search
(Add our new database to the GBrowse.conf)
(Net-based Installer Script)
Line 1: Line 1:
{{SessionHead}}
+
{{ImageCenter|GBrowseLogo.png|GBrowse Logo|400|http://gmod.org/wiki/GBrowse#Logo}}
{| class="tutorialheader"
+
__NOTITLE__
| {{TutorialTitleLine|GBrowse}}<br />
+
[[2011 GMOD Spring Training]]<br />
+
8-12 March 2011<br />
+
[[User:Scott|Scott Cain]]
+
| align="right" | {{#icon: GBrowseLogo.png|GBrowse|200|gmod:GBrowse}}
+
|}
+
  
 +
{{ComponentBox|{{GBrowseResourcesBoxItem}}|<!--{{ComponentBoxEventsHeader}}|{{GMODAmericas2011BoxItem|2011 GMOD Spring Training|GMOD Spring Training|March 8-12}}-->|||| }}
  
{{TocRight}}
+
The Generic Genome Browser (GBrowse) is a genome [[Visualization|viewer]] and is [[GMOD]]'s most popular [[GMOD Components|component]]. For a demo of its features, see [http://wormbase.org/db/gb2/gbrowse/c_elegans/ WormBase], [http://flybase.org/cgi-bin/gbrowse/dmel FlyBase], or [http://projects.tcag.ca/cgi-bin/duplication/dupbrowse/human_b35 Human Genome Segmental Duplication Database] and  [[GMOD_Users|others]].
=Prerequisites=
+
  
Installed before using apt or cpan.
+
==Description==
 +
[[image:gbrowse_screenshot1.gif|right|thumb|350px|GBrowse running on [http://hapmap.org/downloads/index.html HapMap.org] [[Media:gbrowse_screenshot1.gif|Click to view at full resolution]]]]
  
=Install GBrowse=
+
GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. Some of its features include:
  
Easily installed via the cpan shell:
+
* Simultaneous bird's eye and detailed views of the genome.
  <span class="enter">sudo cpan</span>
+
* Scroll, zoom, center.
  cpan> <span class="enter">install Bio::Graphics::Browser2</span>
+
* Use a variety of [[GBrowse Configuration HOWTO#Glyphs and Glyph Options|premade glyphs]] or create your own.
 +
* Attach arbitrary URLs to any annotation.
 +
* Order and appearance of tracks are customizable by administrator and end-user.
 +
* Search by annotation ID, name, or comment.
 +
* Supports third party annotation using [[GFF]] formats.
 +
* Settings persist across sessions.
 +
* DNA and [[GFF]] dumps.
 +
* Connectivity to different databases, including [[BioSQL]] and [[Chado]].
 +
* Multi-language support.
 +
* Third-party feature loading.
 +
* Customizable [[GBrowse Plugins|plug-in]] architecture (e.g. run [[wp:BLAST|BLAST]], dump & import many formats, find oligonucleotides, [[PrimerDesigner.pm|design primers]], create restriction maps, edit features)
  
Which gets all of the prereqs that aren't installed on the machine.
+
==GBrowse Versions ==
  
=Tutorial=
+
'''GBrowse 1.X''' (currently 1.70) is the older series that has been in use since 2002. It is recommended for applications which use a single database only and which must support legacy browsers.
  
Go to http://localhost/gbrowse2
+
'''GBrowse 2.0''' is a rewrite of the original GBrowse to add dynamic updating via AJAX and a smoother user experience. In addition, it provides administrators with the ability to attach a different genome database to each GBrowse track, making it much easier to manage and update tracks. It also provides a distributed backend system of "slave" renderers, allowing each track to be rendered in parallel on a different machine and significantly increasing performance. GBrowse 2.0 is considered stable,but does not have full internationalization support. In addition, there may be issues with older browsers that do not support newer JavaScript features.
  
=Basic [[Chado]] Configuration (if we have time)=
+
==Installation==
  
{{CPAN|Bio::DB::Das::Chado}} was installed when we created the image. Sample configuration files are available with GBrowse, and we'll get the sample Chado file:
+
GBrowse is [[Glossary#Perl|Perl]]-based. It can be installed using the standard Perl module build procedure, or automated using a network-based install script. In order to use the net installer, you will need to have Perl 5.8.6 or higher and the Apache web server installed. See the step-by-step instructions below for detailed instructions:
  
  <span class="enter">wget http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/trunk/contrib/conf_files/07.chado.conf -O pythium.conf</span>
+
* [[GBrowse Install HOWTO]]
  
 +
* [[GBrowse_MacOSX_HOWTO|Install on MacOSX]]
 +
* [[GBrowse_Windows_HOWTO|Install on Windows]]
 +
* [[GBrowse_Ubuntu_HOWTO|Install on Ubuntu and other Debian-based systems]]
 +
* [[GBrowse_RPM_HOWTO|Install on Fedora Core and other RPM-based systems]]
 +
* [[GBrowse_Gentoo_HOWTO|Install on Gentoo Linux system]]
 +
* [[GBrowse_Install_HOWTO|Source Code Install (for other Linux systems)]]
  
 +
==Documentation==
 +
===On-line documentation===
 +
{{GB doc box}}
  
Some simple tweaks and additions:
 
  
*Change description
+
===POD documentation===
*Get rid of <tt>database = main</tt>
+
There are many useful POD documents included with the distribution.  These are converted to HTML files when you install the package, and can be found in /gbrowse/docs/pod:
*Remove or change examples (yeast examples don't help anybody)
+
*Add initial landmark (<tt>initial landmark = scf1117875582023</tt>)
+
  
==DB connection info==
+
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/BIOSQL_ADAPTER_HOWTO.pod|BIOSQL_ADAPTER_HOWTO.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/GENBANK_HOWTO.pod|GENBANK_HOWTO.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/PLUGINS_HOWTO.pod|PLUGINS_HOWTO.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/INSTALL.MacOSX.pod|INSTALL.MacOSX.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/DAS_HOWTO.pod|DAS_HOWTO.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/INSTALL.pod|INSTALL.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/README-chado.pod|README-chado.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/FAQ.pod|FAQ.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/MAKE_IMAGES_HOWTO.pod|MAKE_IMAGES_HOWTO.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/README-gff-files.pod|README-gff-files.pod}} (see also [[GFF]])
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/GBROWSE_IMG.pod|GBROWSE_IMG.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/ORACLE_AND_POSTGRESQL.pod|ORACLE_AND_POSTGRESQL.pod}}
 +
* {{SF_SVN|Generic-Genome-Browser/trunk/docs/pod/README-lucegene.pod|README-lucegene.pod}}
  
[annotation:database]
+
Since these are in Perl POD format these files may contain formatting code when viewed in a Web browser.
db_adaptor    = Bio::DB::Das::Chado
+
db_args      = -dsn dbi:Pg:dbname=chado
+
                -user gmod
+
                -inferCDS 1
+
                -srcfeatureslice 1
+
search options = default
+
  
==Add a BAM data source==
+
==Downloads==
  
[bam_sample:database]
+
=== Source Code Download (tar.gz file) ===
db_adaptor    = Bio::DB::Sam
+
db_args        = -fasta /var/www/gbrowse2/databases/pythium/scf1117875582023.fasta
+
                  -bam  /var/www/gbrowse2/databases/pythium/simulated-sorted.bam
+
search options = default
+
  
==Add track defaults==
+
Download the source from the [http://sourceforge.net/project/showfiles.php?group_id=27707 SourceForge download page].
  
[TRACK DEFAULTS]
+
=== Net-based Installer Script ===
glyph      = generic
+
database    = annotation
+
height      = 8
+
bgcolor    = cyan
+
fgcolor    = black
+
label density = 25
+
bump density  = 100
+
  
Note particularly the "database" entry--for most tracks we'll be using the annotation database, but the bam_sample data source will be available when we want it.
+
The net installer script, called {{GitHub|GBrowse|master/bin/gbrowse_netinstall.pl|gbrowse_netinstall.pl at the GBrowse GitHub repository}} will automatically download and install GBrowse and its Perl libraries for you. See [[#Installation|Installation]] for details on using this script.
  
==Add some tracks==
+
=== SVN ===
  
[Genes]
+
There are often new features and bug fixes in the current development version which have not yet been released. To get the latest version, please use [[Subversion]] (SVN). The recommended branch to use is ''trunk'', which is usually stable:
feature      = gene
+
glyph        = gene
+
ignore_sub_part = polypeptide
+
#bgcolor      = yellow
+
forwardcolor = yellow
+
reversecolor = turquoise
+
label        = sub { my $f = shift;
+
                    my $name = $f->display_name;
+
                    my @aliases = sort $f->attributes('Alias');
+
                    $name .= " (@aliases)" if @aliases;
+
                    $name;
+
  }
+
height      = 6
+
description  = 0
+
key          = Named gene
+
+
[CDS]
+
feature      = mRNA
+
glyph        = cds
+
description  = 0
+
ignore_sub_part = polypeptide exon
+
height      = 26
+
sixframe    = 1
+
label        = sub {shift->name . " reading frame"}
+
key          = CDS
+
citation    = This track shows CDS reading frames.
+
+
[repeats]
+
feature      = match:repeatmasker
+
glyph        = generic
+
bgcolor      = black
+
key          = Repeats
+
+
[ests]
+
feature      = expressed_sequence_match
+
glyph        = segments
+
stranded    = 1
+
bgcolor      = green
+
key          = EST matches
+
+
[proteins]
+
feature      = protein_match
+
glyph        = segments
+
stranded    = 1
+
bgcolor      = pink
+
fgcolor      = red
+
key          = protein matches
+
+
[CoverageXyplot]
+
feature        = coverage
+
glyph          = wiggle_xyplot
+
database      = bam_sample
+
height        = 50
+
fgcolor        = black
+
bicolor_pivot  = 20
+
pos_color      = blue
+
neg_color      = red
+
key            = Coverage (xyplot)
+
+
[Reads]
+
feature        = match
+
glyph          = segments
+
draw_target    = 1
+
show_mismatch  = 1
+
mismatch_color = red
+
database      = bam_sample
+
bgcolor        = blue
+
fgcolor        = white
+
height        = 5
+
label density  = 50
+
bump          = fast
+
key            = Reads
+
+
[Pair]
+
feature      = read_pair
+
glyph        = segments
+
database      = bam_sample
+
draw_target  = 1
+
show_mismatch = 1
+
bgcolor      = sub {
+
                my $f = shift;
+
                return $f->attributes('M_UNMAPPED') ? 'red' : 'green';
+
                }
+
fgcolor      = green
+
height        = 3
+
label        = sub {shift->display_name}
+
label density = 50
+
bump          = fast
+
connector    = dashed
+
balloon hover = sub {
+
                my $f    = shift;
+
                return '' unless $f->type eq 'match';
+
                return 'Read: '.$f->display_name.' : '.$f->flag_str;
+
                }
+
key          = Read Pairs
+
  
==Add our new database to the GBrowse.conf==
+
  svn co https://gmod.svn.sourceforge.net/svnroot/gmod/Generic-Genome-Browser/trunk Generic-Genome-Browser
  
To let GBrowse know that there is a new database available, we have to add a few lines to GBrowse.conf.  Add this to the bottom:
+
Once you have successfully checked out the Generic-Genome-Browser distribution, fetch recent changes by executing <code>svn update</code> inside the <code>Generic-Genome-Browser</code> directory.
  
[pythium]
+
You can also browse the {{SF_SVN|Generic-Genome-Browser|GBrowse SVN}}.
description  = Pythium ultimum
+
path          = pythium.conf
+
  
===Updating SAMtools===
+
==== 1.x Development Version ====
  
The version of SAMtools may need to be updatedGet the samtools release:
+
The link above will get you to the GBrowse2 development versionTo get to the GBrowse 1.x development branch, use stable:
  
   cd ~/Documents/Software
+
   svn co https://gmod.svn.sourceforge.net/svnroot/gmod/Generic-Genome-Browser/branches/stable Generic-Genome-Browser
  wget -O samtools-0.1.13.tar.bz2 http://sourceforge.net/projects/samtools/files/samtools/0.1.13/samtools-0.1.13.tar.bz2/download
+
  tar jxvf samtools-0.1.13.tar.bz2
+
  cd samtools-0.1.13
+
  make
+
  
Install Bio::DB::Sam:
+
== About Databases ==
  
  sudo cpan
+
{{:GBrowse Adaptors}}
  cpan> install Bio::DB::Sam
+
  
when asked "Please enter the location of the bam.h and compiled libbam.a files:", answer:
+
==Contacts==
  
  /home/gmod/Documents/Software/samtools-0.1.13
+
Please report bugs to the SourceForge [http://sourceforge.net/tracker/?func=add&group_id=27707&atid=391291 Bug Tracker] (select 'Category: Gbrowse').
  
==Add semantic zooming for the BAM tracks==
+
{{MailingListsFor|GBrowse}}
  
Not doing this for very dense data (like BAM) is probably the number one performance killers for GBrowse; asking GBrowse to draw a track that has thousands of glyphs is time consuming (and ultimately, probably not very informative).
+
== Logo ==
  
[Reads:5001]
+
The [[:Image:GBrowseLogo.png|GBrowse logo]] was created by [mailto:alexisnb1@yahoo.com Alex Read], a participant in the [[Spring 2010 Logo Program]], while a design student at [http://www.linn-benton.edu Linn-Benton Community College].
feature        = coverage
+
glyph          = wiggle_density
+
height        = 15
+
+
[Pair:5001]
+
feature        = coverage
+
glyph          = wiggle_density
+
height        = 15
+
bgcolor        = purple
+
  
==Add "show summary" functionality==
+
==References==
  
For other tracks, when zoomed way out (100kb or 1MB), performance can similarly suffer, with a decreasing "information" content.  Newer versions of GBrowse provide the ability to automatically generate density plots when zoomed out.  This functionality is available from Chado and {{CPAN|Bio::DB::SeqFeature::Store}} data adaptors.  To prepare our Chado database to do this semantic zooming, we need to run a script that comes with Bio::DB::Das::Chado:
+
==See Also==
  
  cd ~/Documents/Software/gbrowse-adaptors/Chado
+
{{GBrowse}}
  svn update
+
  perl bin/gmod_create_summary_statistics.pl
+
  
and then add to the pythium.conf file, somewhere near the top (ie, not in the track definitions):
+
<references/>
  
  show summary = 99999
+
[[Category:GBrowse]]
 
+
[[Category:GMOD Components]]
==Enabling full text searching==
+
 
+
If we try searching for "<tt>gene 7.92</tt>", we'll get "Not Found" as a result, even though genemark-scf1117875582023-abinit-gene-7.92 does exist.  To look for partial strings, we need to enable full text searching.  To do so, we need to run another script that comes with Bio::DB::Das::Chado:
+
 
+
  perl /home/gmod/Documents/Software/gbrowse-adaptors/Chado/bin/gmod_chado_fts_prep.pl
+
 
+
This does several things (including poorly estimating how long it will take to finish), including creating materialized views, using a tool provided by [[gmod:Category:SGN|SOL Genomics Network (SGN)]].  In practice, it would be a good idea to read the documentation of <tt>gmod_materialized_view_tool.pl</tt> for information on keeping the view up to date.
+
 
+
We also have to tell GBrowse that this Chado database can now do full text searching, by adding this to the Chado database stanza:
+
 
+
  -fulltext 1
+
 
+
Now we can search for "<tt>gene 7.92</tt>" and we'll find our gene (plus it's mRNA and exons) and we can click on the gene to see it in GBrowse.
+
 
+
= Evaluation =
+
 
+
{{Feedback}}
+
 
+
{{NextSession|Apollo|Apollo}}
+

Revision as of 20:04, 19 May 2011

http://gmod.org/wiki/GBrowse#Logo}}

__NOTITLE__

The Generic Genome Browser (GBrowse) is a genome viewer and is GMOD's most popular component. For a demo of its features, see WormBase, FlyBase, or Human Genome Segmental Duplication Database and others.

Description

GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. Some of its features include:

  • Simultaneous bird's eye and detailed views of the genome.
  • Scroll, zoom, center.
  • Use a variety of premade glyphs or create your own.
  • Attach arbitrary URLs to any annotation.
  • Order and appearance of tracks are customizable by administrator and end-user.
  • Search by annotation ID, name, or comment.
  • Supports third party annotation using GFF formats.
  • Settings persist across sessions.
  • DNA and GFF dumps.
  • Connectivity to different databases, including BioSQL and Chado.
  • Multi-language support.
  • Third-party feature loading.
  • Customizable plug-in architecture (e.g. run BLAST, dump & import many formats, find oligonucleotides, design primers, create restriction maps, edit features)

GBrowse Versions

GBrowse 1.X (currently 1.70) is the older series that has been in use since 2002. It is recommended for applications which use a single database only and which must support legacy browsers.

GBrowse 2.0 is a rewrite of the original GBrowse to add dynamic updating via AJAX and a smoother user experience. In addition, it provides administrators with the ability to attach a different genome database to each GBrowse track, making it much easier to manage and update tracks. It also provides a distributed backend system of "slave" renderers, allowing each track to be rendered in parallel on a different machine and significantly increasing performance. GBrowse 2.0 is considered stable,but does not have full internationalization support. In addition, there may be issues with older browsers that do not support newer JavaScript features.

Installation

GBrowse is Perl-based. It can be installed using the standard Perl module build procedure, or automated using a network-based install script. In order to use the net installer, you will need to have Perl 5.8.6 or higher and the Apache web server installed. See the step-by-step instructions below for detailed instructions:

Documentation

On-line documentation

GBrowse 1.x GBrowse 2.0
Usage OpenHelix Tutorial
Install Wiki Wiki
Configure Wiki / Tutorial Wiki / Tutorial


POD documentation

There are many useful POD documents included with the distribution. These are converted to HTML files when you install the package, and can be found in /gbrowse/docs/pod:

Since these are in Perl POD format these files may contain formatting code when viewed in a Web browser.

Downloads

Source Code Download (tar.gz file)

Download the source from the SourceForge download page.

Net-based Installer Script

The net installer script, called gbrowse_netinstall.pl at the GBrowse GitHub repository will automatically download and install GBrowse and its Perl libraries for you. See Installation for details on using this script.

SVN

There are often new features and bug fixes in the current development version which have not yet been released. To get the latest version, please use Subversion (SVN). The recommended branch to use is trunk, which is usually stable:

 svn co https://gmod.svn.sourceforge.net/svnroot/gmod/Generic-Genome-Browser/trunk Generic-Genome-Browser

Once you have successfully checked out the Generic-Genome-Browser distribution, fetch recent changes by executing svn update inside the Generic-Genome-Browser directory.

You can also browse the GBrowse SVN.

1.x Development Version

The link above will get you to the GBrowse2 development version. To get to the GBrowse 1.x development branch, use stable:

 svn co https://gmod.svn.sourceforge.net/svnroot/gmod/Generic-Genome-Browser/branches/stable Generic-Genome-Browser

About Databases

GBrowse has a flexible adaptor (yes, it is spelled that way and is not "adapter") system for running off various types of databases/sources. A common question is "which adaptor should I be using?" This attempts to answer that question.

Adaptor Other required software Roughly how many users Pros Cons
Bio::DB::SeqFeature::Store (use bp_seqfeature_load.pl) MySQL, PostgreSQL, SQLite, BerkeleyDB Many and growing fast. Roughly 4X faster than Bio::DB::GFF for the same data; designed to work with GFF3 Developed for use with GFF3; about 2X slower than Bio::DB::GFF to load a database
Bio::DB::GFF (use bp_load_gff.pl, bp_bulk_load_gff.pl, bp_fast_load_gff.pl) A relational database server: MySQL, PostgreSQL, Oracle, or BerkeleyDB Lots! (Especially MySQL) Quite fast; large user base; Have to use this if your data is in the (now deprecated) GFF2 format. Does not work well with GFF3 formatted data
Bio::DB::Sam (available from CPAN) SAMtools Growing (particularly with GBrowse2) Very fast access to NextGen sequencing data Difficult to use with GBrowse 1.70
Bio::DB::BigWig and Bio::DB::BigWigSet (available from CPAN) UCSC Formats Growing (particularly with GBrowse2) Very fast access to data in bigWig format Difficult to use with GBrowse 1.70
Bio::DB::BigBed (available from CPAN) UCSC Formats Growing (particularly with GBrowse2) Very fast access to data in bigBed format Difficult to use with GBrowse 1.70
Bio::DB::Das::Chado (available from CPAN) PostgreSQL and a Chado schema Relatively few due to the specialized nature of Chado Allows 'live' viewing of the features in a Chado database Slow compared to Bio::DB::GFF
Bio::DB::Das::BioSQL (available from CPAN) MySQL and a BioSQL schema Relatively few due to the small number of BioSQL users Allows 'live' viewing of the features in a BioSQL database Slow compared to Bio::DB::GFF
Memory (ie, flat file database using either Bio::DB::GFF or SeqFeature::Store) None For real servers, none Easy for rapid development and testing Very slow for more than a few thousand features
LuceGene Lucene (searches indexed flat files) Relatively few

Email Threads

There have been some useful email threads on adaptor choices and tradeoffs.

Contacts

Please report bugs to the SourceForge Bug Tracker (select 'Category: Gbrowse').

Mailing List Link Description Archive(s)
GBrowse & GBrowse_syn gmod-gbrowse GBrowse and GBrowse_syn users and developers. Gmane, Nabble (2010/05+), Sourceforge
gmod-gbrowse-cmts Code updates. Sourceforge

The GBrowse logo was created by Alex Read, a participant in the Spring 2010 Logo Program, while a design student at Linn-Benton Community College.

References

See Also

Installation and Setup

GBrowse 2.0 Install HOWTOAdvancedInstall PathsCygwinGentooMac OS XUbuntuWindowsMigrating from GBrowse 1.X to 2.XGBrowse 1.X Install HOWTO

Configuration

GBrowse 2.0 HOWTOGBrowse Configuration HOWTOAuthenticationBalloonsDASFeature Frequency HistogramsGlyphsI18nImagesURL schemaSubtracksGBrowse AdaptorsGBrowse Backends

Development

GBrowse Persistent Variables

Other

Balloon TipsGBrowse imgRubber Band SelectionGlyphs and Glyph OptionsGbrowse BenchmarkingGBrowse User UploadsGBrowse FAQ

See also Category:GBrowse