Difference between revisions of "GBrowse syn Tutorial new"

From GMOD
Jump to: navigation, search
(Setting up the Alignment Database)
m (reformatting, removing #icon stuff)
 
(7 intermediate revisions by one other user not shown)
Line 1: Line 1:
{| class="tutorialheader"
+
{{ TutorialHeader
| align="right" | {{#icon: GBrowse_syn_logo.png|GBrowse_syn|200|GBrowse_syn}}<br /><br />
+
| what = GBrowse syn
| {{TutorialTitleLine|[[GBrowse_syn]]}}<br />
+
| where = [http://www.k-state.edu/agc/symp2011/ Arthropod Genomics Symposium 2011]
[http://www.k-state.edu/agc/symp2011/ Arthropod Genomics Symposium 2011]<br />
+
| when = June 2011
11 June 2011<br />
+
| who = [[User:Mckays|Sheldon McKay, iPlant Collaborative, University of Arizona]]
[[User:Mckays|Sheldon McKay, iPlant Collaborative, University of Arizona]]
+
| logo = GBrowse_syn_logo.png
|}
+
}}
  
<div style="float:right">
 
<center>[[Image:Gbrowse_syn2.png|thumb|300px|GBrowse_syn as it looks at The Arabidopsis Information Resource (TAIR)]]</center>
 
</div>
 
  
 
[[GBrowse_syn]] is a [[GBrowse]]-based [[Synteny|synteny]] browser designed to display multiple genomes, with a central reference species compared to two or more additional species. &nbsp;It is included with the standard GBrowse package (version 1.69 and later).
 
[[GBrowse_syn]] is a [[GBrowse]]-based [[Synteny|synteny]] browser designed to display multiple genomes, with a central reference species compared to two or more additional species. &nbsp;It is included with the standard GBrowse package (version 1.69 and later).
*Working examples of GBrowse_syn can be seen at <span class="pops">[http://www.arabidopsis.org/cgi-bin/gbrowse_syn/arabidopsis/?name=Chr1%3A8367000..8370501;search_src=thaliana TAIR]</span> and <span class="pops">[http://dev.wormbase.org/db/seq/gbrowse_syn/compara?search_src=Cele;name=X:1050001..1150000 WormBase]</span>.
+
*Working examples of GBrowse_syn can be seen at [http://www.arabidopsis.org/cgi-bin/gbrowse_syn/arabidopsis/?name=Chr1%3A8367000..8370501;search_src=thaliana TAIR] and [http://dev.wormbase.org/db/seq/gbrowse_syn/compara?search_src=Cele;name=X:1050001..1150000 WormBase].
 
+
 
+
__TOC__
+
  
=GBrowse_syn Introduction=
+
==GBrowse_syn Introduction==
 
* An introductory talk will be presented using the slides below.  Click the section to open.
 
* An introductory talk will be presented using the slides below.  Click the section to open.
  
 
<div class=switch id="Introductory Slides">
 
<div class=switch id="Introductory Slides">
[[Image:slide1.png|border]]<br><br>
+
[[File:slide1.png|border]]<br><br>
[[Image:slide2.png|border]]<br><br>
+
[[File:slide2.png|border]]<br><br>
[[Image:slide3.png|border]]<br><br>
+
[[File:slide3.png|border]]<br><br>
[[Image:slide4.png|border]]<br><br>
+
[[File:slide4.png|border]]<br><br>
[[Image:slide5.png|border]]<br><br>
+
[[File:slide5.png|border]]<br><br>
[[Image:slide6.png|border]]<br><br>
+
[[File:slide6.png|border]]<br><br>
[[Image:slide7.png|border]]<br><br>
+
[[File:slide7.png|border]]<br><br>
[[Image:slide8.png|border]]<br><br>
+
[[File:slide8.png|border]]<br><br>
[[Image:slide9.png|border]]<br><br>
+
[[File:slide9.png|border]]<br><br>
[[Image:slide10.png|border]]<br><br>
+
[[File:slide10.png|border]]<br><br>
[[Image:slide11.png|border]]<br><br>
+
[[File:slide11.png|border]]<br><br>
[[Image:slide12.png|border]]<br><br>
+
[[File:slide12.png|border]]<br><br>
[[Image:slide13.png|border]]<br><br>
+
[[File:slide13.png|border]]<br><br>
[[Image:slide14.png|border]]<br><br>
+
[[File:slide14.png|border]]<br><br>
[[Image:slide15.png|border]]<br><br>
+
[[File:slide15.png|border]]<br><br>
[[Image:slide16.png|border]]<br><br>
+
[[File:slide16.png|border]]<br><br>
[[Image:slide17.png|border]]<br><br>
+
[[File:slide17.png|border]]<br><br>
[[Image:slide18.png|border]]<br><br>
+
[[File:slide18.png|border]]<br><br>
[[Image:slide19.png|border]]<br><br>
+
[[File:slide19.png|border]]<br><br>
[[Image:slide20.png|border]]<br><br>
+
[[File:slide20.png|border]]<br><br>
[[Image:slide21.png|border]]<br><br>
+
[[File:slide21.png|border]]<br><br>
[[Image:slide22.png|border]]<br><br>
+
[[File:slide22.png|border]]<br><br>
[[Image:slide23.png|border]]<br><br>
+
[[File:slide23.png|border]]<br><br>
[[Image:slide24.png|border]]<br>
+
[[File:slide24.png|border]]<br>
  
 
</div>
 
</div>
  
  
=Installing GBrowse_syn=
+
==Installing GBrowse_syn==
 
GBrowse_syn is part of the GBrowse 2.0 package and was pre-installed when you went through the [[GBrowse 2.0 HOWTO|GBrowse 2.0 installation]].
 
GBrowse_syn is part of the GBrowse 2.0 package and was pre-installed when you went through the [[GBrowse 2.0 HOWTO|GBrowse 2.0 installation]].
  
Line 71: Line 65:
 
</div>
 
</div>
 
Now point your browser to http://localhost/cgi-bin/gb2/gbrowse_syn
 
Now point your browser to http://localhost/cgi-bin/gb2/gbrowse_syn
[[Image:GBrowse_synWelcom.png|border|thumb|left|600px|This is the welcome screen you should see after installing a new copy of GBrowse_syn with no configured data sources.  It contains instructions on how to set up the example data source provided with the distribution.]]
+
[[File:GBrowse_synWelcom.png|border|thumb|left|600px|This is the welcome screen you should see after installing a new copy of GBrowse_syn with no configured data sources.  It contains instructions on how to set up the example data source provided with the distribution.]]
 
<br clear=all>
 
<br clear=all>
  
=Setting up the sample data=
+
==Setting up the sample data==
 
<div class="emphasisbox" style="width:700px;margin-top:50px">
 
<div class="emphasisbox" style="width:700px;margin-top:50px">
 
* Sample data and configuration information for GBrowse_syn come pre-packaged with GBrowse.
 
* Sample data and configuration information for GBrowse_syn come pre-packaged with GBrowse.
Line 82: Line 76:
 
</div>
 
</div>
  
==Setting up the Alignment Database==
+
===Setting up the Alignment Database===
  
The alignment, or joining database will contain the sequence alignments between the two rice species.  It will be in a [[gmod:MySQL|MySQL]] database.
+
The alignment, or joining database will contain the sequence alignments between the two rice species.  It will be in a [[MySQL|MySQL]] database.
  
 
1) Create a MySQL database to hold the alignment data
 
1) Create a MySQL database to hold the alignment data
Line 115: Line 109:
 
  $ <span class="enter">sudo gunzip rice.aln.gz</span>
 
  $ <span class="enter">sudo gunzip rice.aln.gz</span>
 
Have a look at the first few lines of the data:
 
Have a look at the first few lines of the data:
 
 
  $ <span class="enter">head -20 rice.aln</span>
 
  $ <span class="enter">head -20 rice.aln</span>
 
  CLUSTAL W(1.81) multiple sequence alignment W(1.81)
 
  CLUSTAL W(1.81) multiple sequence alignment W(1.81)
Line 137: Line 130:
 
  rice-3(+)/16598648-16600199      tggagcctccccttctagctcgatcacgctctgctcttccgcttggaggctggcaaaact
 
  rice-3(+)/16598648-16600199      tggagcctccccttctagctcgatcacgctctgctcttccgcttggaggctggcaaaact
 
  wild_rice-3(+)/14467855-14469373 tggagcctccccttctagctcgatcgcgctctgctcttccgcttggaggctggcaaaact
 
  wild_rice-3(+)/14467855-14469373 tggagcctccccttctagctcgatcgcgctctgctcttccgcttggaggctggcaaaact
</pre>
+
 
 
The format is CLUSTALW.  This is a formatting convention; it does not mean CLUSTALW was used to generate the alignment data.  See [[#Further Reading|Further Reading]] below for more information on data loading and the meta-data in the sequence names
 
The format is CLUSTALW.  This is a formatting convention; it does not mean CLUSTALW was used to generate the alignment data.  See [[#Further Reading|Further Reading]] below for more information on data loading and the meta-data in the sequence names
 
</div>
 
</div>
Line 143: Line 136:
 
4) Load the database with the script <tt>gbrowse_syn_load_alignments_msa.pl</tt>, which is automatically installed along with GBrowse.  See the <span class=pops>[http://gmod.org/wiki/GBrowse_syn_Scripts GBrowse_syn scripts]</span> page for details on the options for the script.
 
4) Load the database with the script <tt>gbrowse_syn_load_alignments_msa.pl</tt>, which is automatically installed along with GBrowse.  See the <span class=pops>[http://gmod.org/wiki/GBrowse_syn_Scripts GBrowse_syn scripts]</span> page for details on the options for the script.
 
<div class="indent">
 
<div class="indent">
  $ <span class="enter">gbrowse_syn_load_alignments_msa.pl -u root -p gmodamericas2011 -d rice_synteny -c -v rice.aln</span>
+
  $ <span class="enter">gbrowse_syn_load_alignments_msa.pl -u root -p gmodamericas2010 -d rice_synteny -c -v rice.aln</span>
  
 
There are 1800 alignment blocks, so this will take a little while to run.
 
There are 1800 alignment blocks, so this will take a little while to run.
 +
</div>
  
==Setting up the Configuration Files==
+
===Setting up the Configuration Files===
 
<div class="emphasisbox">
 
<div class="emphasisbox">
 
* The configuration files required for this data source are pre-installed with [[GBrowse]], in <tt>/etc/gbrowse2/synteny/</tt>.
 
* The configuration files required for this data source are pre-installed with [[GBrowse]], in <tt>/etc/gbrowse2/synteny/</tt>.
Line 157: Line 151:
 
description =  BLASTZ alignments for Oryza sativa
 
description =  BLASTZ alignments for Oryza sativa
  
===Sample Configuration Files===
+
====Sample Configuration Files====
 
# The synteny database
 
# The synteny database
 
join        = dbi:mysql:database=rice_synteny;host=localhost
 
join        = dbi:mysql:database=rice_synteny;host=localhost
Line 224: Line 218:
 
</div>
 
</div>
  
===Activating the Oryza Data Source===
+
====Activating the Oryza Data Source====
 
1) Make sure the temporary image directory specified in the config files exists and is world-writable
 
1) Make sure the temporary image directory specified in the config files exists and is world-writable
 
<div class="indent">
 
<div class="indent">
Line 240: Line 234:
 
3) Point your browser to http://localhost/cgi-bin/gb2/gbrowse_syn/oryza.  You should see:
 
3) Point your browser to http://localhost/cgi-bin/gb2/gbrowse_syn/oryza.  You should see:
 
<div class="indent">
 
<div class="indent">
[[Image:We_made_it1.png|left|800px]]
+
[[File:GBrowse_SynWe_made_it1.png|left|800px]]
 
<br clear=all>
 
<br clear=all>
 
</div>
 
</div>
Line 246: Line 240:
 
4) Click on the first example, you should (eventually) see:
 
4) Click on the first example, you should (eventually) see:
 
<div class="indent">
 
<div class="indent">
[[Image:We_made_it2.png|left|800px]]
+
[[File:GBrowse_synWe_made_it2.png|left|800px]]
 
<br clear=all>
 
<br clear=all>
 
</div>
 
</div>
Line 253: Line 247:
 
<div class="indent">
 
<div class="indent">
 
* mouse over one of the genes:
 
* mouse over one of the genes:
[[Image:bubble1.png]]<br clear=all>
+
[[File:Gbrowse_synBubble1.png]]<br clear=all>
 
* Click on one of the bold blue highlighted section titles.  This takes you to a contextual help page on the GMOD wiki.
 
* Click on one of the bold blue highlighted section titles.  This takes you to a contextual help page on the GMOD wiki.
 
* Click and drag on the overview panel.  This will trigger rubber band selection to recenter or resize the displayed image
 
* Click and drag on the overview panel.  This will trigger rubber band selection to recenter or resize the displayed image
 
</div>
 
</div>
  
==Speeding up the Browser==
+
===Speeding up the Browser===
 
You can speed up the image loading time by putting your species' [[GFF3]] data into relational MySQL databases.
 
You can speed up the image loading time by putting your species' [[GFF3]] data into relational MySQL databases.
  
Line 264: Line 258:
  
 
<div class="indent">
 
<div class="indent">
  $ <span class="enter">mysql -uroot -pgmodamericas2011</span>
+
  $ <span class="enter">mysql -uroot -pgmodamericas2010</span>
  
 
  mysql> <span class="enter">create database rice;</span>
 
  mysql> <span class="enter">create database rice;</span>
Line 275: Line 269:
 
</div>
 
</div>
  
2) Populate the databases using the [[GBrowse Install HOWTO#GFF3|Loading <tt>bp_seqfeature_load.pl</tt>]] (pre-installed as part of [[gmod:BioPerl|BioPerl]] with [[GBrowse]]). This will load the [[GFF3]] data into a MySQL relational database. Note the MySQL user will root-level privileges.
+
2) Populate the databases using the [[GBrowse Install HOWTO#GFF3|loading script <tt>bp_seqfeature_load.pl</tt>]] (pre-installed as part of [[BioPerl|BioPerl]] with [[GBrowse]]). This will load the GFF3 data into a MySQL relational database. Note the MySQL user will root-level privileges.
  
 
<div class="indent">
 
<div class="indent">
 
  $ <span class="enter">cd /var/www/gbrowse2/databases/gbrowse_syn/rice</span>
 
  $ <span class="enter">cd /var/www/gbrowse2/databases/gbrowse_syn/rice</span>
  $ <span class="enter">bp_seqfeature_load.pl -u root -p gmodamericas2011 -d rice -c -f rice.gff3</span>
+
  $ <span class="enter">bp_seqfeature_load.pl -u root -p gmodamericas2010 -d rice -c -f rice.gff3</span>
 
  loading rice.gff3...
 
  loading rice.gff3...
 
  Building object tree...
 
  Building object tree...
Line 288: Line 282:
  
 
  $ <span class="enter">cd ../wild_rice</span>
 
  $ <span class="enter">cd ../wild_rice</span>
  $ <span class="enter">bp_seqfeature_load.pl -u root -p gmodamericas2011 -d wild_rice -c -f wild_rice.gff3</span>
+
  $ <span class="enter">bp_seqfeature_load.pl -u root -p gmodamericas2010 -d wild_rice -c -f wild_rice.gff3</span>
 
  loading wild_rice.gff3...
 
  loading wild_rice.gff3...
 
  Building object tree...
 
  Building object tree...
Line 297: Line 291:
 
</div>
 
</div>
  
3) Modify the following stanza in the file <tt>rice_synteny.conf</tt> in cd /etc/gbrowse2/synteny/.  This will convert your data source from a flat file database to a MySQL relational database.
+
3) Modify the following stanza in the file <tt>rice_synteny.conf</tt>.  This will convert your data source from a flat file database to a MySQL relational database.
  
 
<div class="indent">
 
<div class="indent">
Line 310: Line 304:
 
4) repeat for <tt>wild_rice_synteny.conf</tt>
 
4) repeat for <tt>wild_rice_synteny.conf</tt>
  
=Using Non-alignment Data=
+
==Using Non-alignment Data==
 
<div class="emphasisbox">
 
<div class="emphasisbox">
 
This example uses gene orthology-based synteny blocks* based created by [http://genome.sfu.ca/orthoclusterdb OrthoCluster] for three nematode species, <i>C. elegans</i>, <i>C. briggsae</i> and <i>P. pacificus</i>.
 
This example uses gene orthology-based synteny blocks* based created by [http://genome.sfu.ca/orthoclusterdb OrthoCluster] for three nematode species, <i>C. elegans</i>, <i>C. briggsae</i> and <i>P. pacificus</i>.
Line 364: Line 358:
 
  2 directories, 8 files
 
  2 directories, 8 files
  
In the <tt>conf</tt> directory, there are configuration files for the joining database and each of the three species.  They are similar in structure to the examples shown above, except that the database adapter {{CPAN|Bio::DB::GFF}} and a gene aggregator are used because the [[gmod:GFF2|GFF is version 2]].  For example:
+
In the <tt>conf</tt> directory, there are configuration files for the joining database and each of the three species.  They are similar in structure to the examples shown above, except that the database adapter {{CPAN|Bio::DB::GFF}} and a gene aggregator are used because the [[GFF2|GFF is version 2]].  For example:
 
<pre>
 
<pre>
 
[GENERAL]
 
[GENERAL]
Line 378: Line 372:
  
  
The <tt>gff</tt> directory contains gene annotations for each of the three species, derived from [http://www.wormbase.org WormBase] (release WS204).  The files are in [[gmod:GFF2|GFF2]] format, which is why the {{CPAN|Bio::DB::GFF}} adapter is required.  A sample is shown here:
+
The <tt>gff</tt> directory contains gene annotations for each of the three species, derived from [http://www.wormbase.org WormBase] (release WS204).  The files are in [[GFF2]] format, which is why the {{CPAN|Bio::DB::GFF}} adapter is required.  A sample is shown here:
  
 
<pre>
 
<pre>
Line 421: Line 415:
 
<div class="indent">
 
<div class="indent">
 
  $ <span class="enter">cd ORTHOCLUSTER/gff</span>
 
  $ <span class="enter">cd ORTHOCLUSTER/gff</span>
  $ <span class="enter">mysql -uroot -pgmodamericas2011 -e 'create database ele'</span>
+
  $ <span class="enter">mysql -uroot -pgmodamericas2010 -e 'create database ele'</span>
  $ <span class="enter">screen bp_fast_load_gff.pl -u root -p gmodamericas2011 -d ele -c ele.gff</span>
+
  $ <span class="enter">screen bp_fast_load_gff.pl -u root -p gmodamericas2010 -d ele -c ele.gff</span>
 
</div>
 
</div>
  
Line 432: Line 426:
 
<div class="indent">
 
<div class="indent">
 
  $ <span class="enter">cd ..</span>
 
  $ <span class="enter">cd ..</span>
  $ <span class="enter">mysql -uroot -pgmodamericas2011 -e 'create database orthocluster'</span>
+
  $ <span class="enter">mysql -uroot -pgmodamericas2010 -e 'create database orthocluster'</span>
  $ <span class="enter">gbrowse_syn_load_alignment_database.pl -u root -p gmodamericas2011 -d orthocluster -c -v orthocluster.txt</span>
+
  $ <span class="enter">gbrowse_syn_load_alignment_database.pl -u root -p gmodamericas2010 -d orthocluster -c -v orthocluster.txt</span>
 
</div>
 
</div>
  
Line 444: Line 438:
  
 
8) Go back to your browser and reload the rice page.  There should now be a second data source in a pull-down menu.
 
8) Go back to your browser and reload the rice page.  There should now be a second data source in a pull-down menu.
<br>
+
 
:[[Image:pulldown1.png]]<br clear=all>
+
[[File:GBrowse_synPulldown1.png]]
  
 
9) Select the other data source and start browsing!
 
9) Select the other data source and start browsing!
<br>
 
:[[Image:etfinit.png|border|left|700px]]<br clear=all>
 
  
=Further Reading=
+
[[File:Gbrowse_synEtfinit.png|left|700px]]
==A Note on Whole Genome Alignments==
+
 
 +
==Further Reading==
 +
===A Note on Whole Genome Alignments===
 
The focus of the section of the course is on dealing with alignment or synteny data and using [[GBrowse_syn]].  However, how to generate whole genome alignments, identify orthologous regions, etc, are the subject of considerable interest, so some background reading is listed below:
 
The focus of the section of the course is on dealing with alignment or synteny data and using [[GBrowse_syn]].  However, how to generate whole genome alignments, identify orthologous regions, etc, are the subject of considerable interest, so some background reading is listed below:
*<span class="pops">[http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-104.html Primer on Hierarchical Genome Alignment Strategies]</span>
+
*[http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-104.html Primer on Hierarchical Genome Alignment Strategies]
*<span class="pops">[http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2577869 article on PECAN and ENREDO]</span>
+
*[http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2577869 article on PECAN and ENREDO]
*<span class="pops">[http://www.ebi.ac.uk/~bjp/pecan/ all about PECAN]</span>
+
*[http://www.ebi.ac.uk/~bjp/pecan/ all about PECAN]
 
* The gene annotations for each species are in [[GFF]] files.
 
* The gene annotations for each species are in [[GFF]] files.
* The alignment data are in a <span class="pops">[[GBrowse_syn Database#Clustal alignment format|constrained CLUSTALW format]]</span> (They were not generated by the program CLUSTALW, which is not necessarily suitable for whole genome alignments)
+
* The alignment data are in a [[GBrowse_syn Database#Clustal alignment format|constrained CLUSTALW format]] (They were not generated by the program CLUSTALW, which is not necessarily suitable for whole genome alignments)
 
* There are processing steps for the alignment data but it is very computationally intensive and we will load pre-processed data to get a head start.
 
* There are processing steps for the alignment data but it is very computationally intensive and we will load pre-processed data to get a head start.
  
==Documentation==
+
===Documentation===
 
There is detailed documentation on the GMOD wiki for how to install, configure and use GBrowse_syn.  To get started, browse these pages:
 
There is detailed documentation on the GMOD wiki for how to install, configure and use GBrowse_syn.  To get started, browse these pages:
*<span class="pops">[[GBrowse_syn|GBrowse_syn overview]]</span>
+
*[[GBrowse_syn|GBrowse_syn overview]]
*<span class="pops">[[GBrowse_syn#Installation|Installation]]</span>
+
*[[GBrowse_syn#Installation|Installation]]
 
+
*[[GBrowse_syn Configuration|Configuration]]
 
+
*[[GBrowse_syn Database|Alignment Data]]
 
+
*[[GBrowse_syn Help|The user interface]]
 
+
*[[GBrowse_syn#Presentations_and_Workshops|Presentations and workshops]]
 
+
 
+
 
+
 
<hr>
 
<hr>
<div style="float:right">[[Image:iPlant.png|250px]]</div>
+
[[File:iPlant.png|250px|right]]</div>
  
 
[[Category:GMOD Components]]
 
[[Category:GMOD Components]]
 
[[Category:GBrowse syn]]
 
[[Category:GBrowse syn]]
 
[[Category:Tutorials]]
 
[[Category:Tutorials]]

Latest revision as of 22:24, 3 October 2012


GBrowse syn

This GBrowse syn tutorial was presented by Sheldon McKay, iPlant Collaborative, University of Arizona at the Arthropod Genomics Symposium 2011, June 2011. The most recent GBrowse syn tutorial can be found at the GBrowse syn Tutorial page.GBrowse_syn is a GBrowse-based synteny browser designed to display multiple genomes, with a central reference species compared to two or more additional species.  It is included with the standard GBrowse package (version 1.69 and later).

  • Working examples of GBrowse_syn can be seen at TAIR and WormBase.

GBrowse_syn Introduction

  • An introductory talk will be presented using the slides below. Click the section to open.


Installing GBrowse_syn

GBrowse_syn is part of the GBrowse 2.0 package and was pre-installed when you went through the GBrowse 2.0 installation.

Update: We will need to update the GBrowse source to include features and bug patches not included in the CPAN distribution:

$ sudo cpan -i Bio::Graphics
  • Then check out a fresh copy of the current GBrowse 2 source code via subversion (svn).
$ cd /home/gmod/Downloads/sources
$ svn co https://gmod.svn.sourceforge.net/svnroot/gmod/Generic-Genome-Browser/trunk Generic-Genome-Browser
$ cd Generic-Genome-Browser
$ sudo perl Build.PL
$ sudo ./Build install

NOTE: use the default options when prompted.

Now point your browser to http://localhost/cgi-bin/gb2/gbrowse_syn

This is the welcome screen you should see after installing a new copy of GBrowse_syn with no configured data sources. It contains instructions on how to set up the example data source provided with the distribution.


Setting up the sample data

  • Sample data and configuration information for GBrowse_syn come pre-packaged with GBrowse.
  • The example we will use is a two-species comparison of rice (Oryza sativa) and one of its wild relatives*

*Data courtesy of Bonnie Hurwitz; sequences and names have been obfuscated to protect unpublished data

Setting up the Alignment Database

The alignment, or joining database will contain the sequence alignments between the two rice species. It will be in a MySQL database.

1) Create a MySQL database to hold the alignment data

$ mysql -u root -p
Enter password: ****************
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 37
Server version: 5.1.37-1ubuntu5.1 (Ubuntu)
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database rice_synteny;
Query OK, 1 row affected (0.00 sec)
mysql>

2) Give read-only (SELECT privileges in SQL) to the default apache user www-data. We can do this for all of the MySQL databases, since they are all for web applications

mysql> GRANT SELECT on *.* TO 'www-data'@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> quit

3) Decompress the sample alignment data and load the database. You need to have root-level access (be a sudoer) for some of the steps below.

$ cd /var/www/gbrowse2/databases/gbrowse_syn/alignments/
$ sudo gunzip rice.aln.gz

Have a look at the first few lines of the data:

$ head -20 rice.aln
CLUSTAL W(1.81) multiple sequence alignment W(1.81)


rice-3(+)/16598648-16600199      ggaggccggccgtctgccatgcgtgagccagacggggcgggccggagacaggccacgtgg
wild_rice-3(+)/14467855-14469373 gggggccgg------------------------------------agacaggccacgtgg
                                 ** ******                                    ***************


rice-3(+)/16598648-16600199      ccctgccccgggctgttgacccactggcacccctgtcccgggttgtcgccctcctttccc
wild_rice-3(+)/14467855-14469373 ccctgccccgggctgttgacccactggcacccctgtcccgggttgtcgccctcctttccc
                                 ************************************************************


rice-3(+)/16598648-16600199      cgccatgctctaagtttgctcctcttctcgaacttctctctttgattcttcacgtcctct
wild_rice-3(+)/14467855-14469373 cgccatgctctaagtttgctcctcttctcgaacttctctctttgattcttcacgtcctct
                                 ************************************************************


rice-3(+)/16598648-16600199      tggagcctccccttctagctcgatcacgctctgctcttccgcttggaggctggcaaaact
wild_rice-3(+)/14467855-14469373 tggagcctccccttctagctcgatcgcgctctgctcttccgcttggaggctggcaaaact

The format is CLUSTALW. This is a formatting convention; it does not mean CLUSTALW was used to generate the alignment data. See Further Reading below for more information on data loading and the meta-data in the sequence names

4) Load the database with the script gbrowse_syn_load_alignments_msa.pl, which is automatically installed along with GBrowse. See the GBrowse_syn scripts page for details on the options for the script.

$ gbrowse_syn_load_alignments_msa.pl -u root -p gmodamericas2010 -d rice_synteny -c -v rice.aln

There are 1800 alignment blocks, so this will take a little while to run.

Setting up the Configuration Files

  • The configuration files required for this data source are pre-installed with GBrowse, in /etc/gbrowse2/synteny/.
  • There are two species' config files, rice_synteny.conf and wild_rice_synteny.conf, and the joining config file, oryza.synconf. The latter file has been disabled by appending a '.disabled' extension to the file name.

The joining config file, oryza.synconf:

[GENERAL]
description =  BLASTZ alignments for Oryza sativa

====Sample Configuration Files====
# The synteny database
join        = dbi:mysql:database=rice_synteny;host=localhost

# This option maps the relationship between the species data sources, names and descriptions
# The value for "name" (the first column) is the symbolic name that gbrowse_syn users to identify each species.
# This value is also used in two other places in the gbrowse_syn configuration:
# the species name in the "examples" directive and the species name in the .aln file
# The value for "conf. file" is the basename of the corresponding gbrowse .conf files.
# This value is also used to identify the species configuration stanzas at the bottom of the configuration file.

#                 name          conf. file            Description
source_map =      rice          rice_synteny          "Domesic Rice (O. sativa)"
                  wild_rice     wild_rice_synteny     "Wild Rice"

tmpimages     = /tmp/gbrowse2
imagewidth    = 800
stylesheet    = /gbrowse2/css/gbrowse_transparent.css
cache time    = 1

config_extension = conf

# example searches to display
examples = rice 3:16050173..16064974
           wild_rice 3:1..400000

zoom levels = 5000 10000 25000 50000 100000 200000 400000

# species-specific databases
[rice_synteny]
tracks    = EG
color     = blue

[wild_rice_synteny]
tracks    = EG
color     = red

A sample species config file, rice_synteny.conf:

[GENERAL]
description   = Domestic rice chromosome 3
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
                -dir    /var/www/gbrowse2/databases/gbrowse_syn/rice


# Web site configuration info
tmpimages   = /tmp/gbrowse2

[EG]
feature      = gene:ensembl
glyph        = gene
height       = 10
bgcolor      = peachpuff
fgcolor      = hotpink
description  = 0
label        = 0
category     = Transcripts
key          = ensembl gene
balloon hover = Hello, my name is $name!

Note: the species databases are actually using the GFF3 flat file, in-memory adapter

Activating the Oryza Data Source

1) Make sure the temporary image directory specified in the config files exists and is world-writable

$ sudo mkdir /var/www/tmp
$ sudo mkdir /var/www/tmp/gbrowse2
$ sudo chmod 777 /var/www/tmp/gbrowse2

2) Renaming the configuration file

$ cd /etc/gbrowse2/synteny
$ sudo mv oryza.synconf.disabled oryza.synconf

3) Point your browser to http://localhost/cgi-bin/gb2/gbrowse_syn/oryza. You should see:

GBrowse synWe made it1.png


4) Click on the first example, you should (eventually) see:

GBrowse synWe made it2.png


5) Try out a few user interface features:

  • mouse over one of the genes:

Gbrowse synBubble1.png

  • Click on one of the bold blue highlighted section titles. This takes you to a contextual help page on the GMOD wiki.
  • Click and drag on the overview panel. This will trigger rubber band selection to recenter or resize the displayed image

Speeding up the Browser

You can speed up the image loading time by putting your species' GFF3 data into relational MySQL databases.

1) Create a database for each of the GFF data files (rice.gff3 and wild_rice.gff3).

$ mysql -uroot -pgmodamericas2010
mysql> create database rice;
Query OK, 1 row affected (0.00 sec)
mysql> create database wild_rice;
Query OK, 1 row affected (0.00 sec)
mysql> quit

2) Populate the databases using the loading script bp_seqfeature_load.pl (pre-installed as part of BioPerl with GBrowse). This will load the GFF3 data into a MySQL relational database. Note the MySQL user will root-level privileges.

$ cd /var/www/gbrowse2/databases/gbrowse_syn/rice
$ bp_seqfeature_load.pl -u root -p gmodamericas2010 -d rice -c -f rice.gff3
loading rice.gff3...
Building object tree...
0.55s0s
Loading bulk data into database... 0.73s
load time: 11.99s
$ cd ../wild_rice
$ bp_seqfeature_load.pl -u root -p gmodamericas2010 -d wild_rice -c -f wild_rice.gff3
loading wild_rice.gff3...
Building object tree...
0.55s7a
Loading bulk data into database... 0.69s
load time: 12.02s

3) Modify the following stanza in the file rice_synteny.conf. This will convert your data source from a flat file database to a MySQL relational database.

# from
db_args       = -adaptor memory
                -dir    /var/www/html/gbrowse/databases/gbrowse_syn/rice
# to
db_args       = -dsn dbi:mysql:rice

4) repeat for wild_rice_synteny.conf

Using Non-alignment Data

This example uses gene orthology-based synteny blocks* based created by OrthoCluster for three nematode species, C. elegans, C. briggsae and P. pacificus.

*Data courtesy of Jack Chen and Ismael Vergera

1) Download and unpack the data archive file orthocluster.tar.gz.

$ cd ~/Documents/Data/gbrowse_syn
$ rm orthocluster.tar.gz
$ wget ftp://ftp.gmod.org/pub/gmod/GBrowse_syn/orthocluster.tar.gz
$ tar zxf orthocluster.tar.gz

2) Examine the contents of the ORTHOCLUSTER directory tree using the Unix tree command. It is not installed by default, so we will have to get it first.

$ sudo apt-get install tree
[sudo] password for gmod:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 37 not upgraded.
Need to get 31.1kB of archives.
After this operation, 98.3kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com karmic/universe tree 1.5.2.2-1 [31.1kB]
Fetched 31.1kB in 0s (37.0kB/s)
Selecting previously deselected package tree.
(Reading database ... 135915 files and directories currently installed.)
Unpacking tree (from .../tree_1.5.2.2-1_i386.deb) ...
Processing triggers for man-db ...
Setting up tree (1.5.2.2-1) ...

Now we can use it

$ tree ORTHOCLUSTER/
ORTHOCLUSTER/
|-- conf
|   |-- bri.conf
|   |-- ele.conf
|   |-- orthocluster.synconf
|   `-- ppa.conf
|-- gff
|   |-- bri.gff
|   |-- ele.gff
|   `-- ppa.gff
`-- orthocluster.txt
2 directories, 8 files

In the conf directory, there are configuration files for the joining database and each of the three species. They are similar in structure to the examples shown above, except that the database adapter Bio::DB::GFF and a gene aggregator are used because the GFF is version 2. For example:

[GENERAL]
description   = C. briggsae
db_adaptor    = Bio::DB::GFF
db_args       = -dsn dbi:mysql:bri

# This is the GFF2 aggregator that assembles gene models
# from coding exon features with the same parent
aggregators = gene{coding_exon}


The gff directory contains gene annotations for each of the three species, derived from WormBase (release WS204). The files are in GFF2 format, which is why the Bio::DB::GFF adapter is required. A sample is shown here:

##gff-version 2
##sequence-region I 1 15072421
##sequence-region II 1 15279324
##sequence-region III 1 13783685
##sequence-region IV 1 17493784
##sequence-region V 1 20924143
##sequence-region X 1 17718854
I	curated	coding_exon	11641	11689	.	+	0	CDS "Y74C9A.2"
I	curated	coding_exon	14951	15160	.	+	2	CDS "Y74C9A.2"
I	curated	coding_exon	16473	16585	.	+	2	CDS "Y74C9A.2"
I	curated	coding_exon	43733	43961	.	+	0	CDS "Y74C9A.1"
I	curated	coding_exon	44030	44234	.	+	2	CDS "Y74C9A.1"
I	curated	coding_exon	44281	44324	.	+	1	CDS "Y74C9A.1"
I	curated	coding_exon	44372	44468	.	+	2	CDS "Y74C9A.1"
I	curated	coding_exon	44521	44677	.	+	1	CDS "Y74C9A.1"
I	curated	coding_exon	47472	47610	.	+	0	CDS "Y48G1C.12"
I	curated	coding_exon	47696	47858	.	+	2	CDS "Y48G1C.12"
I	curated	coding_exon	48348	48530	.	+	1	CDS "Y48G1C.12"
I	curated	coding_exon	49251	49416	.	+	1	CDS "Y48G1C.12"

The file orthocluster.txt contains the synteny data. The first few lines are shown below. The first 12 fields in each row specify information about the synteny block in each species and the series of numbers are orthologous gene coordinate pairs that are used for linking orthologs with grid-lines in the GBrowse_syn display. See 'Alignment Data' under Further Reading below for more details of this loading format.

bri	chrI	176154	183558	+	.	ppa	Ppa_Contig88	27212	30786	+	.	176154	27212	177594	30786	182118	27212	183558	30786	|	30786	183558	27212	182118	30786	177594	27212	176154
bri	chrI	778780	799223	+	.	ppa	Ppa_Contig88	533454	542961	-	.	778780	539924	786778	542961	789497	533454	799223	538726	|	538726	799223	533454	789497	542961	786778	539924	778780
bri	chrI	986150	994698	+	.	ppa	Ppa_Contig77	29481	45600	-	.	986150	37055	989649	45600	991428	29481	994698	36608	|	36608	994698	29481	991428	45600	989649	37055	986150
bri	chrI	1453793	1461931	+	.	ppa	Ppa_Contig132	156183	165414	-	.	1453793	163110	1456404	165414	1456712	160849	1457637	162712	1458361	160204	1459245	160815	1459468	159346	1459854	160000	1459962	156183	1461931	159022	|	159022	1461931	156183	1459962	160000	1459854	159346	1459468	160815	1459245	160204	1458361	162712	1457637	160849	1456712	165414	1456404	163110	1453793


3) Set the $TMP environmental variable so that the database loading script knows where to put its temp files.

$ export TMP=/tmp

4) Create and load a Bio::DB:GFF database for C. elegans (ele). Use screen so that we can get the time-consuming loading script started and then use Ctrl-A D to set the screen running in the background and move on to other steps.

$ cd ORTHOCLUSTER/gff
$ mysql -uroot -pgmodamericas2010 -e 'create database ele'
$ screen bp_fast_load_gff.pl -u root -p gmodamericas2010 -d ele -c ele.gff

5) Repeat step 4 for the other two species (bri and ppa).


6) Create and load the alignment the alignment database. The gbrowse_syn_load_alignment_database.pl script is pre-installed with GBrowse.

$ cd ..
$ mysql -uroot -pgmodamericas2010 -e 'create database orthocluster'
$ gbrowse_syn_load_alignment_database.pl -u root -p gmodamericas2010 -d orthocluster -c -v orthocluster.txt

7) Copy the configuration files to the required location

$ cd conf
$ sudo cp *conf /etc/gbrowse2/synteny

8) Go back to your browser and reload the rice page. There should now be a second data source in a pull-down menu.

GBrowse synPulldown1.png

9) Select the other data source and start browsing!

Gbrowse synEtfinit.png

Further Reading

A Note on Whole Genome Alignments

The focus of the section of the course is on dealing with alignment or synteny data and using GBrowse_syn. However, how to generate whole genome alignments, identify orthologous regions, etc, are the subject of considerable interest, so some background reading is listed below:

Documentation

There is detailed documentation on the GMOD wiki for how to install, configure and use GBrowse_syn. To get started, browse these pages:


IPlant.png
</div>