Difference between revisions of "Apollo Tutorial"

From GMOD
Jump to: navigation, search
m (New page: {{UnderConstruction}} {| class="tutorialheader" | align="right" | {{#icon: 2009SummerSchoolAmericas170.png|2009 GMOD Summer School - Americas|120|2009 GMOD Summer School - Americas}}<br /...)
 
m
Line 17: Line 17:
  
  
= VMware =
+
== VMware ==
 
{|
 
{|
 
| valign="top" |This tutorial was taught using a [[VMware]] system image as a starting point.  If you want to start with that same system, download and install the ''Starting'' image.
 
| valign="top" |This tutorial was taught using a [[VMware]] system image as a starting point.  If you want to start with that same system, download and install the ''Starting'' image.
Line 33: Line 33:
 
|}
 
|}
  
= Caveats =
+
== Caveats ==
  
 
{{TutorialCaveats}}
 
{{TutorialCaveats}}
  
 
__TOC__
 
__TOC__
 +
 +
 +
==Introduction==
 +
 +
===Overview===
 +
Once we have a sequence assembled, we need to annotate that sequence, that is add features such as genes, pseudogenes, ncRNAs, etc.  Otherwise we just have sequence and can't make much sense of the data.  Computational analysis, such as Genscan, FGeneSH and tRNAscanSE are a great way to start the annotation process, as they help us localize regions of interest.  However, these automated tools are far from perfect and the results often require manual updating from expert biologists.  That is where Apollo comes in.  Apollo is a sequence annotation editor and will allow you to create and edit annotations.
 +
[[Image:Annotation-workflow.jpg|center|Annotation workflow]]
 +
 +
===Architecture===
 +
Apollo is setup in a 3-tier architecture, with a presentation (GUI), logic and data layer.  It is highly configurable, with most users configuring the presentation and data layers.
 +
[[Image:Architecture.jpg|Apollo architecture|center]]
 +
====Presentation Layer====
 +
The presentation layer (GUI) handles displaying and gives the user an interface for creating and editing these annotations.  Customization of this layer usually entails setting up how features are displayed (e.g., color, shape, labels).
 +
====Logic Layer====
 +
The logic layer handles how data is represented and the various operations you can perform on the data (e.g., creating, editing, adding information to annotations).
 +
====Data Layer====
 +
The data layer takes care of interfacing with the different data sources.  Customization of the data layer usually entails setting up access to different databases (e.g., your own [[Chado]] instance) to even creating new adapters to read new data formats or schemas.
 +
 +
==Installation==
 +
You can download Apollo from  pre-built installer packages or getting the code from either CVS or tarball, both which require building the application.
 +
 +
===Pre-built Installers===
 +
You can download OS-specific pre-built installer packages from the [http://apollo.berkeleybop.org/apollo-1.11.2/install.html Apollo installer page].  We provide the following installers:
 +
 +
<table class="wikitable" width="50%">
 +
<tr>
 +
<th width="50%">Platform</th>
 +
<th>Optionally bundled JRE</th>
 +
</tr>
 +
<tr>
 +
<td>Windows</td>
 +
<td>Yes</td>
 +
</tr>
 +
<tr>
 +
<td>Mac OS X</td>
 +
<td>No</td>
 +
</tr>
 +
<tr>
 +
<td>Linux</td>
 +
<td>Yes</td>
 +
</tr>
 +
<tr>
 +
<td>Solaris</td>
 +
<td>Yes (x86 version)</td>
 +
</tr>
 +
<tr>
 +
<td>Unix</td>
 +
<td>No</td>
 +
</tr>
 +
</table>
 +
 +
Since we're installing it on our Linux virtual machine, we'll download the Linux version.  Java has already been setup in these machines, so we'll get the installer without the bundled JRE.  The bundled JRE option is a great solution for users who don't have Java installed or have multiple ones installed and are not sure how to select the correct one to be used.  Download the Linux installer and save it to <code>~/to_be_installed</code>.
 +
 +
The Linux installer is a shell script.  If we want to install Apollo to the common application locations (such as /usr/local/bin), we'll need root access.  We'll do everything from the command line since it's easier that way.  Open up a terminal window and type the following:
 +
 +
<bash>
 +
$ cd to_be_installed
 +
$ sudo /bin/sh Apollo_unix.sh
 +
</bash>
 +
 +
We'll just install everything with the default options.
 +
 +
===Checking Out the Code From CVS===
 +
You can also checkout the code from SourceForge CVS.  You'll need a CVS client to do so.  This will guarantee that you'll get the latest Apollo code.  Note that you'll be getting the development version and as such might not be fully stable.  The following commands apply to Unix based command line CVS clients for an anonymous checkout.
 +
 +
<bash>
 +
$ cvs -d:pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod login
 +
$ cvs -z3 -d:pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod co -P apollo
 +
</bash>
 +
 +
If you're using an IDE, chances are that your IDE will have CVS support (or have a plugin available).
 +
 +
===Getting the Source Tarball===
 +
You can also download the source tarball from [http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=43774 SourceForge].  You can get the source code for both the current and previous public releases.  You won't get the latest development code as in CVS, but you should be getting code that is more stable.
 +
 +
==Using Apollo==
 +
We'll start off with seeing some of the features that Apollo can do.  We'll be connecting to our local Chado instance.  A customized Apollo Chado configuration has been setup for this.  Don't worry, we'll cover the details on how we did that once we talk about [[#Setting Up Custom Chado Configurations|setting up custom Chado configurations]].
 +
 +
<div class="emphasisbox">
 +
The following only applies to students who are having issues starting Fluxbox without root (sudo) privileges (apparently only an issue on Vista).  For those students, we'll copy the custom configurations to the freshly installed Apollo.  Type the following in the terminal (this will all make sense later on):
 +
<bash>
 +
$ cd ~/.apollo
 +
$ sudo cp gmod_summer_school.* chado-adapter.xml /usr/local/Apollo/conf
 +
</bash>
 +
</div>
 +
 +
Let's first start by launching Apollo.  Type the following in your terminal:
 +
 +
<bash>
 +
$ apollo
 +
</bash>
 +
 +
We'll see an option for which data we want to load.  We'll choose <code>Chado database</code> as our data source.  Since we still don't have annotations on our data, we can't use the <code>gene</code> option.  Select <code>contig</code> in <code>Select a region to display</code>.  Let's look at <code>scf1117875581239</code> in the region between <code>102000</code> and <code>110000</code>.
 +
[[Image:Chado-adapter.jpg|Chado adapter|center]]
 +
Once loading is complete, we'll see the main Apollo window.
 +
[[Image:Apollo-main-window.jpg|Apollo main window|center]]
 +
The panels with the aqua background are for annotations and those with the black background are for computational results.  The white box in the middle with the ruler represent the genomic region with the numbers being coordinates.  The annotation and result panels above the genomic coordinate window are for the plus strand and the ones on the bottom are for the minus strand.  The "Zoom" buttons will allow you to zoom in and out of the currently loaded region.  The panels below provide information on the currently selected feature.
 +
 +
We only have results loaded and no annotations.  So let's create a gene.  We'll select <code>maker-106017-102976</code>.  We want to create a transcript with all of the exons.  So to select all of the exons, we just need to double left-click on one.  You'll notice that all the exons have a red border around them.
 +
[[Image:Apollo-no-genes.jpg|Apollo with evidence model selected|center]]
 +
Now that they're all selected, to create a new gene it's as easy as just dragging and dropping into the annotation panel.  Voila!  We have a new gene, <code>GMOD:temp1</code>, with transcript <code>GMOD:temp1-transcript1</code>.
 +
[[Image:Apollo-with-genes.jpg|Apollo with a single gene model|center]]
 +
OK, let's try this again with another model.  Let's do the same with <code>fgenesh_masked-scf1117875581239-abinit-gene-1.48-mRNA-1</code> (man, that's a long name!).  We can see that the transcript belongs to a new gene, <code>GMOD:temp2</code>.  Makes sense, it's a obviously a separate gene.
 +
[[Image:Apollo-with-two-genes.jpg|Apollo with two gene models|center]]
 +
But what if we were create a new feature from <code>genemark_masked-scf1117875581239-abinit-gene-1.91-mRNA-1</code>?  Let's find out.  Whoa!  We can see that this new transcript was created as part of <code>GMOD:temp2</code>.
 +
[[Image:Apollo-with-two-genes-splice-variants.jpg|Apollo with splice variants|center]]
 +
That's great, as it looks as it's a splice variant, rather than a whole new gene.  So is this always the case?  After all, there are overlapping genes, right?  Apollo looks to see if there are any existing transcripts with in-frame overlaps to the new transcript.  If that's the case, it is considered a splice variant.  Otherwise, a new gene is created.
 +
 +
You'll notice that the newly created genes have an ID of the form <code>GMOD:temp#</code> and the transcripts have an ID of the form <code>GMOD:temp#-transcript#</code>.  Apollo uses naming adapters to define how newly created features should be named.  For example, FlyBase uses <code>FBgn:temp#</code> for genes and <code>FBgn:temp#:chromosome#:chromosome_start-chromosome_end-R?</code> where <code>?</code> is <code>A</code>, <code>B</code>, <code>C</code> and so on.
 +
 +
Let's make sure that our Chado connectivity is working.  Let's save our work using <code>File &rarr; Save as...</code>.
 +
[[Image:File-save-as.jpg|File &rarr; Save as...|center]]
 +
Make sure that <code>Chado database</code> is selected at the data source (should already be).
 +
[[Image:Chado-save-dialog.jpg|Chado save dialog|center]]
 +
You'll notice that the IDs have changed.  This is because the GMOD naming adapter follows the convention that all newly created features should have an ID of <code>PREFIX:FEATURE_ID</code> for a Chado database.  This only occurs with newly created features.  If you modify the ID of an existing feature and save that, this ID replacement will not take place.
 +
 +
Let's reload the data with <code>File &rarr; Open new...</code>.
 +
[[Image:File-open-new.jpg|File &rarr; Open new...|center]]
 +
Again, make sure that <code>Chado database</code> is the selected data source (should already be).  You'll notice that all the information we put in when we first loaded from the database is already there.  Apollo keeps a history of loading and saving so that you can easily access previous data sources.  Click <code>Ok</code>.  Great, the data is there!
 +
 +
Speaking of which, let's say that we actually want to change the gene ID to something more interesting.  We can do so by selecting an exon in our feature, right clicking it, and choosing <code>Annotation Info Editor...</code> from the popup menu.
 +
[[Image:Annotation-editor-popup-menu.jpg|Annotation editor popup menu|center]]
 +
We can see that we can add lots of interesting information for our gene and transcript.
 +
[[Image:Annotation-editor.jpg|Annotation editor|center]]
 +
Let's go ahead and change the gene symbol to something else.  We can see that this change affects all the transcripts.  Just as we'd expect.
 +
 +
About the popup menu, you can see there are lot of things you can do your existing annotation.  You can merge and split transcripts and exons, move exons from one transcript to another, and lots of other cool stuff.  Let's take a look at merging exons.  Select the 2 exons that you want to merge (hold down <code>shift</code> to allow you select multiple items), right click, and choose <code>Merge exons</code>.
 +
[[Image:Merge-exons-popup-menu.jpg|Merge exons popup menu|center]]
 +
Alright, it does what we'd expect it to do.
 +
[[Image:Apollo-with-merged-exons.jpg]]
 +
But let's say that upon closer inspection, this is not what we wanted.  Previously, you'd need to either manually split the exon again or remove the merged exon and then add both exons again.  Kind of a pain, so traditionally users would save constantly so they could revert to a previous state.  However, we have now added a much sought after feature, <code>undo</code>!  So instead of doing all that work, we can undo our merge with <code>Edit &rarr; Undo</code>.
 +
[[Image:Edit-undo.jpg|Edit &rarr; Undo|center]]
 +
Wow, lookie here, it split the exons again.  Although this looks to be a trivial operation, it's actually very complex, as one single change can lead to multiple cascading changes.  In the case of of the merged exons, we can see that it changed the coding region frame for the downstream exons, thus affecting the CDS.  So a single change caused other implicit changes to occur.
 +
 +
One cool recent addition to Apollo is the ability to do run remote analysis.  We currently support <code>BLAST</code> and <code>Primer-BLAST</code> (primer identification tool) over at NCBI.  Let's look at how the <code>BLAST</code> support works.
 +
 +
Select the first model we created (with ID <code>GMOD:00014333</code> in this guide - the ID in your data might be different).  Double-click on an exon to select the whole model.  Right-click on the selected feature and choose <code>Analyze region</code>.
 +
[[Image:Analyze-region-popup-menu.jpg|Analyze region popup menu|center]]
 +
The <code>Run analysis</code> window will show up.
 +
[[Image:Run-analysis.jpg|Run analysis|center]]
 +
We see there is a tab for <code>NCBI-BLAST</code> and <code>NCBI Primer-BLAST</code>.  We'll just run <code>BLAST</code> for now.  We have a pull-down menu for <code>BLAST type</code> and can select <code>blastn</code>, <code>blastx</code>, and <code>tblastx</code>.
 +
[[Image:Blast-types.jpg|BLAST types|center]]
 +
Let's run a <code>blastn</code> search.  There are a number of options for running <code>BLAST</code> and post processing options.  The post processing options are particularly useful as since we're searching against NCBI's nr database (which is very large), we'll get A LOT of results back.  We'll check the following options:
 +
*Run options
 +
**Filter out low complexity sequence
 +
**Filter out masked sequence
 +
*Post processing options
 +
**Remove hits with an expect above threshold
 +
**Remove hits with a score below threshold
 +
**Remove HSPs with a percent identity below threshold
 +
We can leave the default values for those options.
 +
Click <code>Run</code> to run the analysis.  After a few seconds, a popup window will appear.
 +
[[Image:Analysis-expected-time.jpg|Analysis expected submission time|center]]
 +
This gives us the estimated time before our analysis starts running (as estimated by the NCBI servers).  Note that this is the estimated time for the analysis to start, '''NOT''' the expected time for the analysis to the completed.
 +
Checking for analysis completion all take place in the background, so you can feel free to continue working as usual.  You will be notified when the analysis is complete.
 +
[[Image:Analysis-complete.jpg|Analysis complete|center]]
 +
The new analysis will appear in the results panel and since we ran <code>blastn</code> against the nr database, the type for the result is <code>blastn:nr</code>.
 +
[[Image:Blastn-results.jpg|blastn results|center]]
 +
One last thing worth mentioning is the <code>exon detail editor</code>.  It allows you to make edits to your models at the base level.  We have also recently added the <code>sequence aligner</code> that allows you to make the same types of edits the <code>exon detail editor</code> support, but in reference to multiple alignment data.  We'll come back and talk about the <code>sequence aligner</code> if we have time.
 +
 +
Unfortunately we don't have the time to go over all the sophisticated editing features for Apollo, but you can get more information on all the powerful editing features from the [http://apollo.berkeleybop.org/current/userguide.html Apollo user's guide].
 +
 +
==Configuring Apollo==
 +
Ok, now that we got some idea of what Apollo can do, let's talk about how to configure Apollo.  First of all, be aware that all configuration files can live in two places:
 +
* The global Apollo configuration directory in $APOLLO_ROOT/conf where $APOLLO_ROOT is where Apollo was installed
 +
* User specific configurations, stored in ~/.apollo where ~ is the user home directory (different OS's handle it differently)
 +
 +
The configurations in the user directory take precedence over the global ones.  Depending on the configuration, it will either fully overwrite the global configuration or just overwrite/append to the global one.
 +
 +
There are 3 sets of general configurations we'll discuss: apollo.cfg, data_source.style, data_source.tiers.  You can check out the [http://apollo.berkeleybop.org/current/userguide.html#Configuration Apollo configuration] section from the user guide for a more detailed description of the supported options.
 +
 +
===apollo.cfg===
 +
This is the main Apollo configuration.  Options are composed of columns delimited by white space, where the first column is the option parameter and the following columns are the specific options for the parameter.  <code>//</code> is used for comments and everything following it (up the the new line) will be ignored.  Out of all the options, the most interesting one is <code>DataAdapterInstall</code>, which is used to install data adapters for handling new types of data.  We'll talk about it in more detail in the [[#Writing Custom Data Adapters|writing custom data adapters]] section.  You can just add any new options or ones you wish to override in your custom apollo.cfg file.  The global apollo.cfg options will be used for any options absent in your custom file.
 +
 +
===data_source.style===
 +
Each data source has a style file associated with it.  The style file contains options that are data source specific and should be shared amongst every feature.  Like the apollo.cfg file, it is also composed of columns delimited by white space, where the first column is the option parameter and the following columns are the specific options for the parameter.  <code>//</code> is also used for comments and everything following it (up to the new line) will be ignored.  We've recently added a GUI for setting up the most common options.  You can access it from <code>Edit -> Preferences</code>.
 +
[[Image:Edit-preferences.jpg|Edit -> Preference|center]]
 +
Make sure that the <code>Style</code> tab is selected.
 +
[[Image:Style-wizard.jpg|Style wizard|center]]
 +
Be aware that the GUI only supports a subset of all the options supported.  This was done as to not overwhelm users with overly complex GUIs.  If you need to change anything that is not supported with the GUI, you'll need to do so by manually editing the file.  Of particular interest is the <code>Canned annotation/transcript comments</code> section.  It allows you to add predefined comments that users can add to their top level annotations and transcripts using the <code>annotation info editor</code> from a pull down menu.
 +
 +
===data_source.tiers===
 +
Each data source has a tiers file associated with it.  The tiers files contains options on how to display specific features.  It has a completely different format than both apollo.cfg and data_source.style files.  <code>#</code> is used for comments.  A tiers file contains a set of <code>Tier</code> and <code>Type</code> records.
 +
 +
A tier record defines a set of feature types that will always be displayed together as a group. They will be displayed in the same row if possible when the features are expanded but as close together as possible if they overlap.  A <code>Tier</code> record will look something like this:
 +
[Tier]
 +
tiername : Annotation
 +
visible : true
 +
expanded : true
 +
maxrows : 0
 +
labeled : true
 +
curated : true
 +
warnonedit : false
 +
 +
Following the <code>Tier</code> record is one or more <code>Type</code> records.  A <code>Type</code> record specifies that different types that should appear in the given <code>Tier</code>.  The <code>Type</code> record will look something like this:
 +
[Type]
 +
tiername : Gene Prediction
 +
typename : Genscan
 +
resulttype : genscan:dummy
 +
resulttype : genscan
 +
color : 204,153,255
 +
usescore : true
 +
minscore : - 1
 +
maxscore : 50
 +
glyph : DrawableResultFeatureSet
 +
column : SCORE
 +
column : GENOMIC_RANGE
 +
column : query_frame
 +
sortbycolumn : GENOMIC_RANGE
 +
weburl : <code><nowiki>http://genes.mit.edu/GENSCAN.html#</nowiki></code>
 +
 +
Again, there are many options supported by the tiers file and it can get quite overwhelming.  The current fly.tiers file is over 1500 lines long!  Craziness.  Luckily we've also recently added a GUI for setting the most useful options.  You can access it by clicking <code>Edit -> Preferences</code> and selecting the <code>Types</code> tab.
 +
[[Image:Types-wizard.jpg|Types wizard|center]]
 +
If however you need to change something not supported by the GUI, you'll have to edit the file by hand.  You can learn more about the configuration wizards in the [http://apollo.berkeleybop.org/current/userguide.html#Preferences Preferences] section from the Apollo user guide.
 +
 +
==Setting Up Custom Chado Configurations==
 +
Ok, so we connected to our local Chado instance before with an already existing configuration file.  Now we're going to go into detail on how to set that up.  The file that contains the Chado database configuration is <code>chado-adapter.xml</code>.
 +
 +
===chado-adapter.xml===
 +
Like all other configuration files, it resides in $APOLLO_ROOT/conf for the global configuration and ~/.apollo for the user configurations.  As you can guess from the file extension, this configuration is in XML format (nice how all the formats between the configurations are so consistent, huh? =P).  It contains a <code><chado-adapter></code> root element, with at least one <code>chadoInstance</code> child element and at least one <code>chadodb</code> element.  The skeleton for the XML file will look something like this:
 +
 +
<xml>
 +
<?xml version="1.0" encoding="UTF-8"?>
 +
<chado-adapter>
 +
  <chadoInstance>
 +
    ...
 +
  </chadoInstance>
 +
  ...
 +
  <chadodb>
 +
    ...
 +
  </chadodb>
 +
</chado-adapter>
 +
</xml>
 +
 +
====chadoInstance Element====
 +
You'll need at least one <code>chadoInstance</code> element.  It will look something like this:
 +
 +
<xml>
 +
<chadoInstance id="gmodSummerSchoolInstance" default="true">
 +
 +
  <!-- associated Java class with this instance -->
 +
  <clsName>apollo.dataadapter.chado.jdbc.RiceChadoInstance</clsName>
 +
 +
  <!-- database fields corresponding to top-level entries - will appear in the pulldown menu -->
 +
  <sequenceTypes>
 +
    <type>gene</type>
 +
    <type>
 +
      <name>contig</name>
 +
      <!-- give start and end input box for this region -->
 +
      <useStartAndEnd>true</useStartAndEnd>
 +
      <!-- query the database for valid ids for contigs -->
 +
      <queryForValueList>true</queryForValueList>
 +
      <!-- whether the feature is top level -->
 +
      <isTopLevel>true</isTopLevel>
 +
    </type>
 +
  </sequenceTypes>
 +
 +
  <!-- CV information stored in the Chado instance -->
 +
  <partOfCvTerm>part_of</partOfCvTerm>
 +
  <featureCV>sequence</featureCV>
 +
  <relationshipCV>relationship</relationshipCV>
 +
  <propertyTypeCV>feature_property</propertyTypeCV>
 +
 +
  <!-- list of gene predictions to retrieve -->
 +
  <genePredictionPrograms>
 +
    <program>maker</program>
 +
  </genePredictionPrograms>
 +
 +
  <!-- list of search hits to retrieve -->
 +
  <searchHitPrograms>
 +
    <program>blastn</program>
 +
    <program>blastx</program>
 +
    <program>tblastx</program>
 +
    <program>est2genome</program>
 +
    <program>protein2genome</program>
 +
    <program>repeatmasker</program>
 +
    <program>fgenesh</program>
 +
    <program>fgenesh_masked</program>
 +
    <program>genemark</program>
 +
    <program>genemark_masked</program>
 +
    <program>snap</program>
 +
    <program>snap_masked</program>
 +
  </searchHitPrograms>
 +
 +
  <!-- should always be set to true - will remove this option in the future -->
 +
  <searchHitsHaveFeatLocs>true</searchHitsHaveFeatLocs>
 +
 +
  <!-- list of one-level annotations to retrieve -->
 +
  <oneLevelAnnotTypes>
 +
    <type>promoter</type>
 +
    <type>transposable_element</type>
 +
    <type>remark</type>
 +
    <type>repeat_region</type>
 +
  </oneLevelAnnotTypes>
 +
 +
  <!-- list of three-level annotations to retrieve -->
 +
  <threeLevelAnnotTypes>
 +
    <type>gene</type>
 +
    <type>pseudogene</type>
 +
    <type>tRNA</type>
 +
    <type>snRNA</type>
 +
    <type>snoRNA</type>
 +
    <type>ncRNA</type>
 +
    <type>rRNA</type>
 +
    <type>miRNA</type>
 +
  </threeLevelAnnotTypes>
 +
 +
</chadoInstance>
 +
</xml>
 +
 +
====chadodb Element====
 +
You'll need at least one <code><chadodb></code> element.  It contains information to connect to the database.  Each <code><chadodb></code> element will have a <code><chadoInstance></code> associated with it.  You'll need one <code><chadodb></code> element for each database you want to connect to (you can have multiple ones).  The XML will look something like this:
 +
 +
<xml>
 +
<chadodb>
 +
 +
  <!-- label that will appear in the dropdown list of databases -->
 +
  <name>GMOD Summer School</name>
 +
 +
  <!-- the Apollo class to use for your database -->
 +
  <adapter>apollo.dataadapter.chado.jdbc.PostgresChadoAdapter</adapter>
 +
 +
  <!-- the URL for the database server -->
 +
  <url>jdbc:postgresql://localhost:5432/chado</url>
 +
 +
  <!-- database name -->
 +
  <dbName>chado</dbName>
 +
 +
  <!-- database user / login -->
 +
  <dbUser>gmod</dbUser>
 +
 +
  <!-- identifies the type of Chado database -->
 +
  <dbInstance>gmodSummerSchoolInstance</dbInstance>
 +
 +
  <!-- style configuration for this database -->
 +
  <style>gmod_summer_school.style</style>
 +
 +
  <!-- if set to true, will be database used when launching Apollo using command line arguments -->
 +
  <default-command-line-db>true</default-command-line-db>
 +
 +
</chadodb>
 +
</xml>
 +
 +
==Setting Up a Custom WebStart Instance==
 +
One of the benefits of having Apollo as a Java application is that we can make use of Java WebStart.  This is a great way to deploy Apollo with your custom modifications.  If any modifications are made (either source code or configuration), it will be automatically deployed through WebStart.  To setup our own WebStart instance, we'll need to compile the code ourselves.  See the [[#Installation | installation section]] on information on how to checkout the code.
 +
 +
We've already checked out the code from CVS in our virtual machines.  The code is located in <code>~/software/apollo</code>.  The first thing we'll do is update the CVS to make sure that we have the most up to date code.
 +
 +
<bash>
 +
$ cd ~/software/apollo
 +
$ cvs update
 +
</bash>
 +
 +
Once the CVS update is done, we'll need to create our Apollo jar file for deployment.  Before we do that, we want to make sure that our custom configurations are in the <code>conf</code> directory (we want it to be globally deployed, not locally).  So let's copy our modified <code>chado-adapter.xml</code> and the style and tiers files to the <code>conf</code> directory.
 +
 +
<bash>
 +
$ cp ~/.apollo/chado-adapter.xml ~/.apollo/gmod_summer_school.* conf
 +
</bash>
 +
 +
Now we're ready to build our updated Apollo jar.  We'll use [http://ant.apache.org Apache Ant] to do so.  <code>ant</code> is similar in many ways to <code>make</code> but has a lot of native support for Java.  Like <code>make</code>, we can have multiple targets.  We're interested in the <code>jar</code> target.
 +
 +
<bash>
 +
$ cd src/java
 +
$ ant jar
 +
</bash>
 +
 +
So traditionally, setting up a WebStart instance is quite a bit of work.  Luckily, we have a very nice Perl script that does a lot of the magic for us!  Before we can use this script, we'll need to look at the template XML file that is used for this script.
 +
 +
<xml>
 +
<?xml version="1.0" encoding="UTF-8"?>
 +
<webstart>
 +
 +
  <!-- all this stuff is required for signing jars, shouldn't take too long to run -->
 +
  <jarsigner>
 +
    <alias>apollo</alias>
 +
    <keypass>apollo</keypass>
 +
    <storepass>apollo</storepass>
 +
    <keystore>apollo_store</keystore>
 +
    <validity>700</validity>
 +
    <!-- you might want to put your name -->
 +
    <commonName>GMOD Summer School 2009</commonName>
 +
    <!-- you might want to put your department name -->
 +
    <organizationUnit>GMOD Summer School 2009</organizationUnit>
 +
    <!-- you might want to put your organization's name -->
 +
    <organizationName>GMOD</organizationName>
 +
    <!-- you might want to put your organization's city -->
 +
    <localityName>Durham/Oxford</localityName>
 +
    <!-- you might want to put your organization's state -->
 +
    <stateName>NC/Oxford</stateName>
 +
    <!-- you might want to put your organization's country -->
 +
    <country>USA/UK</country>
 +
  </jarsigner>
 +
 +
  <!-- now we need to populate our jnlp information -->
 +
  <jnlp spec="1.0+">
 +
    <information>
 +
      <title>Apollo</title>
 +
      <vendor>GMOD Summer School 2009</vendor>
 +
      <description>Apollo Webstart</description>
 +
      <!-- location of your project's web page -->
 +
      <homepage href="http://localhost/apollo" />
 +
      <!-- if you want to have WebStart add a specific image as your icon,
 +
            point to the location of the image -->
 +
      <icon href="images/head-of-apollo.gif" kind="shortcut" />
 +
      <!-- create a shortcut on your desktop -->
 +
      <shortcut online="true">
 +
        <desktop />
 +
      </shortcut>
 +
      <!-- allow users to launch Apollo when offline -->
 +
      <offline-allowed />
 +
    </information>
 +
    <!-- request all permissions - might be needed since Apollo will access the local
 +
          file system -->
 +
    <security>
 +
      <all-permissions />
 +
    </security>
 +
    <!-- we require at least Java 1.5, set to start using 64m and up to 500m -->
 +
    <resources>
 +
      <j2se version="1.5+" initial-heap-size="64m" max-heap-size="500m" />
 +
    </resources>
 +
    <!-- where the main method is locate - don't change this -->
 +
    <application-desc main-class="apollo.main.Apollo">
 +
        <!-- we can add arguments when launching Apollo - this particular one allows us to
 +
              load chromosome 1, from 11650000 to 11685000 - great way to have Apollo load
 +
              specific regions -->
 +
        <argument>-i</argument>
 +
        <argument>chadodb</argument>
 +
        <argument>-l</argument>
 +
        <argument>scf1117875581239:102000-110000</argument>
 +
    </application-desc>
 +
  </jnlp>
 +
  <webserver>
 +
    <!-- URL where the webstart instance will reside -->
 +
    <url>http://localhost/apollo/webstart</url>
 +
    <!-- relative location to <url> where jars are located -->
 +
    <jar_location>jars</jar_location>
 +
  </webserver>
 +
</webstart>
 +
</xml>
 +
 +
The nice thing about this template is that you only need to set it up once (assuming you're not changing the URL or any other option).
 +
 +
The Apache web pages reside at <code>/var/www</code>.  We'll create an Apollo directory.  The directory is only writable by root, so we'll need to be root.
 +
 +
<bash>
 +
$ sudo -s
 +
$ cd /var/www
 +
$ mkdir -p apollo/webstart
 +
$ cd apollo/webstart
 +
</bash>
 +
 +
Let's create the file <code>apollo_webstart.xml</code> in the <code>webstart</code> directory we just created.  Now we'll run the magical script.
 +
 +
<bash>
 +
$ ~/software/apollo/bin/webstart_generator.pl -i apollo_webstart.xml -d ~/software/apollo/jars -o apollo.jnlp -D jars
 +
</bash>
 +
 +
Voila, it was THAT easy.  This script took care of signing all the jars and generating the appropriate <code>jnlp</code> file.  Next time, when you make a change to one of the configurations, you'll just need to recompile the <code>apollo.jar</code> and re-run this script.
 +
 +
Lastly, we'll just create a simple web page to link to the Apollo WebStart instance.
 +
 +
<html>
 +
  <body>
 +
    <a href="apollo.jnlp">Launch Apollo!!!</a>
 +
  </body>
 +
</html>
 +
 +
One last note, you'll want to make sure that your web server has support for <code>jnlp</code> files.  With our Apache install, you'll need to make sure that you have the following line in your <code>/etc/mime.types</code> file:
 +
 +
application/x-java-jnlp-file    jnlp
 +
 +
==Writing Custom Data Adapters==
 +
There's a bit of work involved with writing a custom data adapter.  It will require you to have knowledge of Java and the data model used in Apollo.  We won't have enough time to write one ourselves, but we can briefly discuss how the process works.
 +
 +
All data adapters belong to the <code>apollo.dataadapter</code> package.  To build our own custom data adapter, we need to implement two specific interfaces: <code>ApolloDataAdapterI.java</code> and <code>org.bdgp.io.DataAdapterUI</code>.  There are abstract classes that implement common methods for both of those interfaces: <code>AbstractApolloAdapter.java</code> and <code>org.bdgp.swing.AbstractDataAdapterUI</code> respectively.
 +
 +
<code>ApolloDataAdapterI</code> does the work of parsing the input data.  The most important method for the interface is <code>getCurationSet()</code> which does the work of returning the data to the logic layer.
 +
 +
<code>ApolloDataAdapterGUI</code> provides the GUI that we see in the <code>Apollo: load data</code> window.  It implements the necessary interfaces and extends JPanel, so you can build your GUI directly in the class.
 +
 +
We'll take a look at how to get the sample adapter to work.  The code for the sample adapter is located in <code>apollo.dataadapter.sample</code> package.  Since we just finished building the jar, we have already compiled the code.  So all we need to do is tell Apollo to load the data adapter plugin.  We'll add the following to <code>apollo.cfg</code>:
 +
 +
DataAdapterInstall      "apollo.dataadapter.sample.SampleAdapter"  "gmod_summer_school.style"    "Sample data adapter"
 +
 +
The first column tells Apollo that we'll be loading a new data adapter.  The second column points to the class for the data adapter.  The third column tells which style to associate with this data.  The fourth column provides the text that will be displayed in the data adapter pull down menu.
 +
 +
Now if we run Apollo, we'll see "Sample data adapter" as one of the options.  Lovely!
 +
 +
There's a roughly written tutorial on how to create your own data adapter.  It covers some information on the data adapter API and the data model.  You'll want to read that if you're interested in creating one.  It's located in <code>$APOLLO_ROOT/doc/html/dataadapter_cookbook.html</code>.  You can also view the [http://apollo.berkeleybop.org/current/javadoc Apollo Javadoc API].

Revision as of 16:33, 28 September 2009

Under Construction

This page or section is under construction.

{{{1}}}

{{#icon: 2009SummerSchoolAmericas170.png|2009 GMOD Summer School - Americas|120|2009 GMOD Summer School - Americas}}

{{#icon: GMOD2009Europe170.png|2009 GMOD Summer School - Europe|120|2009 GMOD Summer School - Europe}}
Apollo Session

2009 GMOD Summer School - Europe & Americas
July & August 2009
Ed Lee

__NOTITLE__


This tutorial walks you through how to use the Apollo genome annotation editor. This tutorial was originally taught by Ed Lee at the 2009 GMOD Summer School - Europe & Americas. The notes and VMware image used on this page are from the Europe course.



VMware

This tutorial was taught using a VMware system image as a starting point. If you want to start with that same system, download and install the Starting image.

See VMware for what software you need to use a VMware system image, and for directions on how to get the image setup and running on your machine.

Download
Starting Image

Ending Image


Username: gmod
Password: gmod

Caveats

Important Note

This tutorial describes the world as it existed on the day the tutorial was given. Please be aware that things like CPAN modules, Java libraries, and Linux packages change over time, and that the instructions in the tutorial will slowly drift over time. Newer versions of tutorials will be posted as they become available.



Introduction

Overview

Once we have a sequence assembled, we need to annotate that sequence, that is add features such as genes, pseudogenes, ncRNAs, etc. Otherwise we just have sequence and can't make much sense of the data. Computational analysis, such as Genscan, FGeneSH and tRNAscanSE are a great way to start the annotation process, as they help us localize regions of interest. However, these automated tools are far from perfect and the results often require manual updating from expert biologists. That is where Apollo comes in. Apollo is a sequence annotation editor and will allow you to create and edit annotations.

Annotation workflow

Architecture

Apollo is setup in a 3-tier architecture, with a presentation (GUI), logic and data layer. It is highly configurable, with most users configuring the presentation and data layers.

Apollo architecture

Presentation Layer

The presentation layer (GUI) handles displaying and gives the user an interface for creating and editing these annotations. Customization of this layer usually entails setting up how features are displayed (e.g., color, shape, labels).

Logic Layer

The logic layer handles how data is represented and the various operations you can perform on the data (e.g., creating, editing, adding information to annotations).

Data Layer

The data layer takes care of interfacing with the different data sources. Customization of the data layer usually entails setting up access to different databases (e.g., your own Chado instance) to even creating new adapters to read new data formats or schemas.

Installation

You can download Apollo from pre-built installer packages or getting the code from either CVS or tarball, both which require building the application.

Pre-built Installers

You can download OS-specific pre-built installer packages from the Apollo installer page. We provide the following installers:

Platform Optionally bundled JRE
Windows Yes
Mac OS X No
Linux Yes
Solaris Yes (x86 version)
Unix No

Since we're installing it on our Linux virtual machine, we'll download the Linux version. Java has already been setup in these machines, so we'll get the installer without the bundled JRE. The bundled JRE option is a great solution for users who don't have Java installed or have multiple ones installed and are not sure how to select the correct one to be used. Download the Linux installer and save it to ~/to_be_installed.

The Linux installer is a shell script. If we want to install Apollo to the common application locations (such as /usr/local/bin), we'll need root access. We'll do everything from the command line since it's easier that way. Open up a terminal window and type the following:

<bash> $ cd to_be_installed $ sudo /bin/sh Apollo_unix.sh </bash>

We'll just install everything with the default options.

Checking Out the Code From CVS

You can also checkout the code from SourceForge CVS. You'll need a CVS client to do so. This will guarantee that you'll get the latest Apollo code. Note that you'll be getting the development version and as such might not be fully stable. The following commands apply to Unix based command line CVS clients for an anonymous checkout.

<bash> $ cvs -d:pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod login $ cvs -z3 -d:pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod co -P apollo </bash>

If you're using an IDE, chances are that your IDE will have CVS support (or have a plugin available).

Getting the Source Tarball

You can also download the source tarball from SourceForge. You can get the source code for both the current and previous public releases. You won't get the latest development code as in CVS, but you should be getting code that is more stable.

Using Apollo

We'll start off with seeing some of the features that Apollo can do. We'll be connecting to our local Chado instance. A customized Apollo Chado configuration has been setup for this. Don't worry, we'll cover the details on how we did that once we talk about setting up custom Chado configurations.

The following only applies to students who are having issues starting Fluxbox without root (sudo) privileges (apparently only an issue on Vista). For those students, we'll copy the custom configurations to the freshly installed Apollo. Type the following in the terminal (this will all make sense later on): <bash> $ cd ~/.apollo $ sudo cp gmod_summer_school.* chado-adapter.xml /usr/local/Apollo/conf </bash>

Let's first start by launching Apollo. Type the following in your terminal:

<bash> $ apollo </bash>

We'll see an option for which data we want to load. We'll choose Chado database as our data source. Since we still don't have annotations on our data, we can't use the gene option. Select contig in Select a region to display. Let's look at scf1117875581239 in the region between 102000 and 110000.

Chado adapter

Once loading is complete, we'll see the main Apollo window.

Apollo main window

The panels with the aqua background are for annotations and those with the black background are for computational results. The white box in the middle with the ruler represent the genomic region with the numbers being coordinates. The annotation and result panels above the genomic coordinate window are for the plus strand and the ones on the bottom are for the minus strand. The "Zoom" buttons will allow you to zoom in and out of the currently loaded region. The panels below provide information on the currently selected feature.

We only have results loaded and no annotations. So let's create a gene. We'll select maker-106017-102976. We want to create a transcript with all of the exons. So to select all of the exons, we just need to double left-click on one. You'll notice that all the exons have a red border around them.

Apollo with evidence model selected

Now that they're all selected, to create a new gene it's as easy as just dragging and dropping into the annotation panel. Voila! We have a new gene, GMOD:temp1, with transcript GMOD:temp1-transcript1.

Apollo with a single gene model

OK, let's try this again with another model. Let's do the same with fgenesh_masked-scf1117875581239-abinit-gene-1.48-mRNA-1 (man, that's a long name!). We can see that the transcript belongs to a new gene, GMOD:temp2. Makes sense, it's a obviously a separate gene.

Apollo with two gene models

But what if we were create a new feature from genemark_masked-scf1117875581239-abinit-gene-1.91-mRNA-1? Let's find out. Whoa! We can see that this new transcript was created as part of GMOD:temp2.

Apollo with splice variants

That's great, as it looks as it's a splice variant, rather than a whole new gene. So is this always the case? After all, there are overlapping genes, right? Apollo looks to see if there are any existing transcripts with in-frame overlaps to the new transcript. If that's the case, it is considered a splice variant. Otherwise, a new gene is created.

You'll notice that the newly created genes have an ID of the form GMOD:temp# and the transcripts have an ID of the form GMOD:temp#-transcript#. Apollo uses naming adapters to define how newly created features should be named. For example, FlyBase uses FBgn:temp# for genes and FBgn:temp#:chromosome#:chromosome_start-chromosome_end-R? where ? is A, B, C and so on.

Let's make sure that our Chado connectivity is working. Let's save our work using File → Save as....

File → Save as...

Make sure that Chado database is selected at the data source (should already be).

Chado save dialog

You'll notice that the IDs have changed. This is because the GMOD naming adapter follows the convention that all newly created features should have an ID of PREFIX:FEATURE_ID for a Chado database. This only occurs with newly created features. If you modify the ID of an existing feature and save that, this ID replacement will not take place.

Let's reload the data with File → Open new....

File → Open new...

Again, make sure that Chado database is the selected data source (should already be). You'll notice that all the information we put in when we first loaded from the database is already there. Apollo keeps a history of loading and saving so that you can easily access previous data sources. Click Ok. Great, the data is there!

Speaking of which, let's say that we actually want to change the gene ID to something more interesting. We can do so by selecting an exon in our feature, right clicking it, and choosing Annotation Info Editor... from the popup menu.

Annotation editor popup menu

We can see that we can add lots of interesting information for our gene and transcript.

Annotation editor

Let's go ahead and change the gene symbol to something else. We can see that this change affects all the transcripts. Just as we'd expect.

About the popup menu, you can see there are lot of things you can do your existing annotation. You can merge and split transcripts and exons, move exons from one transcript to another, and lots of other cool stuff. Let's take a look at merging exons. Select the 2 exons that you want to merge (hold down shift to allow you select multiple items), right click, and choose Merge exons.

Merge exons popup menu

Alright, it does what we'd expect it to do. Apollo-with-merged-exons.jpg But let's say that upon closer inspection, this is not what we wanted. Previously, you'd need to either manually split the exon again or remove the merged exon and then add both exons again. Kind of a pain, so traditionally users would save constantly so they could revert to a previous state. However, we have now added a much sought after feature, undo! So instead of doing all that work, we can undo our merge with Edit → Undo.

Edit → Undo

Wow, lookie here, it split the exons again. Although this looks to be a trivial operation, it's actually very complex, as one single change can lead to multiple cascading changes. In the case of of the merged exons, we can see that it changed the coding region frame for the downstream exons, thus affecting the CDS. So a single change caused other implicit changes to occur.

One cool recent addition to Apollo is the ability to do run remote analysis. We currently support BLAST and Primer-BLAST (primer identification tool) over at NCBI. Let's look at how the BLAST support works.

Select the first model we created (with ID GMOD:00014333 in this guide - the ID in your data might be different). Double-click on an exon to select the whole model. Right-click on the selected feature and choose Analyze region.

Analyze region popup menu

The Run analysis window will show up.

Run analysis

We see there is a tab for NCBI-BLAST and NCBI Primer-BLAST. We'll just run BLAST for now. We have a pull-down menu for BLAST type and can select blastn, blastx, and tblastx.

BLAST types

Let's run a blastn search. There are a number of options for running BLAST and post processing options. The post processing options are particularly useful as since we're searching against NCBI's nr database (which is very large), we'll get A LOT of results back. We'll check the following options:

  • Run options
    • Filter out low complexity sequence
    • Filter out masked sequence
  • Post processing options
    • Remove hits with an expect above threshold
    • Remove hits with a score below threshold
    • Remove HSPs with a percent identity below threshold

We can leave the default values for those options. Click Run to run the analysis. After a few seconds, a popup window will appear.

Analysis expected submission time

This gives us the estimated time before our analysis starts running (as estimated by the NCBI servers). Note that this is the estimated time for the analysis to start, NOT the expected time for the analysis to the completed. Checking for analysis completion all take place in the background, so you can feel free to continue working as usual. You will be notified when the analysis is complete.

Analysis complete

The new analysis will appear in the results panel and since we ran blastn against the nr database, the type for the result is blastn:nr.

blastn results

One last thing worth mentioning is the exon detail editor. It allows you to make edits to your models at the base level. We have also recently added the sequence aligner that allows you to make the same types of edits the exon detail editor support, but in reference to multiple alignment data. We'll come back and talk about the sequence aligner if we have time.

Unfortunately we don't have the time to go over all the sophisticated editing features for Apollo, but you can get more information on all the powerful editing features from the Apollo user's guide.

Configuring Apollo

Ok, now that we got some idea of what Apollo can do, let's talk about how to configure Apollo. First of all, be aware that all configuration files can live in two places:

  • The global Apollo configuration directory in $APOLLO_ROOT/conf where $APOLLO_ROOT is where Apollo was installed
  • User specific configurations, stored in ~/.apollo where ~ is the user home directory (different OS's handle it differently)

The configurations in the user directory take precedence over the global ones. Depending on the configuration, it will either fully overwrite the global configuration or just overwrite/append to the global one.

There are 3 sets of general configurations we'll discuss: apollo.cfg, data_source.style, data_source.tiers. You can check out the Apollo configuration section from the user guide for a more detailed description of the supported options.

apollo.cfg

This is the main Apollo configuration. Options are composed of columns delimited by white space, where the first column is the option parameter and the following columns are the specific options for the parameter. // is used for comments and everything following it (up the the new line) will be ignored. Out of all the options, the most interesting one is DataAdapterInstall, which is used to install data adapters for handling new types of data. We'll talk about it in more detail in the writing custom data adapters section. You can just add any new options or ones you wish to override in your custom apollo.cfg file. The global apollo.cfg options will be used for any options absent in your custom file.

data_source.style

Each data source has a style file associated with it. The style file contains options that are data source specific and should be shared amongst every feature. Like the apollo.cfg file, it is also composed of columns delimited by white space, where the first column is the option parameter and the following columns are the specific options for the parameter. // is also used for comments and everything following it (up to the new line) will be ignored. We've recently added a GUI for setting up the most common options. You can access it from Edit -> Preferences.

Edit -> Preference

Make sure that the Style tab is selected.

Style wizard

Be aware that the GUI only supports a subset of all the options supported. This was done as to not overwhelm users with overly complex GUIs. If you need to change anything that is not supported with the GUI, you'll need to do so by manually editing the file. Of particular interest is the Canned annotation/transcript comments section. It allows you to add predefined comments that users can add to their top level annotations and transcripts using the annotation info editor from a pull down menu.

data_source.tiers

Each data source has a tiers file associated with it. The tiers files contains options on how to display specific features. It has a completely different format than both apollo.cfg and data_source.style files. # is used for comments. A tiers file contains a set of Tier and Type records.

A tier record defines a set of feature types that will always be displayed together as a group. They will be displayed in the same row if possible when the features are expanded but as close together as possible if they overlap. A Tier record will look something like this:

[Tier]
tiername : Annotation
visible : true
expanded : true
maxrows : 0
labeled : true
curated : true
warnonedit : false

Following the Tier record is one or more Type records. A Type record specifies that different types that should appear in the given Tier. The Type record will look something like this:

[Type]
tiername : Gene Prediction
typename : Genscan
resulttype : genscan:dummy
resulttype : genscan
color : 204,153,255
usescore : true
minscore : - 1
maxscore : 50
glyph : DrawableResultFeatureSet
column : SCORE
column : GENOMIC_RANGE
column : query_frame
sortbycolumn : GENOMIC_RANGE
weburl : http://genes.mit.edu/GENSCAN.html#

Again, there are many options supported by the tiers file and it can get quite overwhelming. The current fly.tiers file is over 1500 lines long! Craziness. Luckily we've also recently added a GUI for setting the most useful options. You can access it by clicking Edit -> Preferences and selecting the Types tab.

Types wizard

If however you need to change something not supported by the GUI, you'll have to edit the file by hand. You can learn more about the configuration wizards in the Preferences section from the Apollo user guide.

Setting Up Custom Chado Configurations

Ok, so we connected to our local Chado instance before with an already existing configuration file. Now we're going to go into detail on how to set that up. The file that contains the Chado database configuration is chado-adapter.xml.

chado-adapter.xml

Like all other configuration files, it resides in $APOLLO_ROOT/conf for the global configuration and ~/.apollo for the user configurations. As you can guess from the file extension, this configuration is in XML format (nice how all the formats between the configurations are so consistent, huh? =P). It contains a <chado-adapter> root element, with at least one chadoInstance child element and at least one chadodb element. The skeleton for the XML file will look something like this:

<xml> <?xml version="1.0" encoding="UTF-8"?> <chado-adapter>

 <chadoInstance>
   ...
 </chadoInstance>
  ...
 <chadodb>
   ...
 </chadodb>

</chado-adapter> </xml>

chadoInstance Element

You'll need at least one chadoInstance element. It will look something like this:

<xml> <chadoInstance id="gmodSummerSchoolInstance" default="true">

 <clsName>apollo.dataadapter.chado.jdbc.RiceChadoInstance</clsName>
 <sequenceTypes>
   <type>gene</type>
   <type>
     <name>contig</name>
     <useStartAndEnd>true</useStartAndEnd>
     <queryForValueList>true</queryForValueList>
     <isTopLevel>true</isTopLevel>
   </type>
 </sequenceTypes>
 <partOfCvTerm>part_of</partOfCvTerm>
 <featureCV>sequence</featureCV>
 <relationshipCV>relationship</relationshipCV>
 <propertyTypeCV>feature_property</propertyTypeCV>
 <genePredictionPrograms>
   <program>maker</program>
 </genePredictionPrograms>
 <searchHitPrograms>
   <program>blastn</program>
   <program>blastx</program>
   <program>tblastx</program>
   <program>est2genome</program>
   <program>protein2genome</program>
   <program>repeatmasker</program>
   <program>fgenesh</program>
   <program>fgenesh_masked</program>
   <program>genemark</program>
   <program>genemark_masked</program>
   <program>snap</program>
   <program>snap_masked</program>
 </searchHitPrograms>
 <searchHitsHaveFeatLocs>true</searchHitsHaveFeatLocs>
 <oneLevelAnnotTypes>
   <type>promoter</type>
   <type>transposable_element</type>
   <type>remark</type>
   <type>repeat_region</type>
 </oneLevelAnnotTypes>
 <threeLevelAnnotTypes>
   <type>gene</type>
   <type>pseudogene</type>
   <type>tRNA</type>
   <type>snRNA</type>
   <type>snoRNA</type>
   <type>ncRNA</type>
   <type>rRNA</type>
   <type>miRNA</type>
 </threeLevelAnnotTypes>

</chadoInstance> </xml>

chadodb Element

You'll need at least one <chadodb> element. It contains information to connect to the database. Each <chadodb> element will have a <chadoInstance> associated with it. You'll need one <chadodb> element for each database you want to connect to (you can have multiple ones). The XML will look something like this:

<xml> <chadodb>

 <name>GMOD Summer School</name>
 <adapter>apollo.dataadapter.chado.jdbc.PostgresChadoAdapter</adapter>
 <url>jdbc:postgresql://localhost:5432/chado</url>
 <dbName>chado</dbName>
 <dbUser>gmod</dbUser>
 <dbInstance>gmodSummerSchoolInstance</dbInstance>
 <style>gmod_summer_school.style</style>
 <default-command-line-db>true</default-command-line-db>

</chadodb> </xml>

Setting Up a Custom WebStart Instance

One of the benefits of having Apollo as a Java application is that we can make use of Java WebStart. This is a great way to deploy Apollo with your custom modifications. If any modifications are made (either source code or configuration), it will be automatically deployed through WebStart. To setup our own WebStart instance, we'll need to compile the code ourselves. See the installation section on information on how to checkout the code.

We've already checked out the code from CVS in our virtual machines. The code is located in ~/software/apollo. The first thing we'll do is update the CVS to make sure that we have the most up to date code.

<bash> $ cd ~/software/apollo $ cvs update </bash>

Once the CVS update is done, we'll need to create our Apollo jar file for deployment. Before we do that, we want to make sure that our custom configurations are in the conf directory (we want it to be globally deployed, not locally). So let's copy our modified chado-adapter.xml and the style and tiers files to the conf directory.

<bash> $ cp ~/.apollo/chado-adapter.xml ~/.apollo/gmod_summer_school.* conf </bash>

Now we're ready to build our updated Apollo jar. We'll use Apache Ant to do so. ant is similar in many ways to make but has a lot of native support for Java. Like make, we can have multiple targets. We're interested in the jar target.

<bash> $ cd src/java $ ant jar </bash>

So traditionally, setting up a WebStart instance is quite a bit of work. Luckily, we have a very nice Perl script that does a lot of the magic for us! Before we can use this script, we'll need to look at the template XML file that is used for this script.

<xml> <?xml version="1.0" encoding="UTF-8"?> <webstart>

 <jarsigner>
   <alias>apollo</alias>
   <keypass>apollo</keypass>
   <storepass>apollo</storepass>
   <keystore>apollo_store</keystore>
   <validity>700</validity>
   <commonName>GMOD Summer School 2009</commonName>
   <organizationUnit>GMOD Summer School 2009</organizationUnit>
   <organizationName>GMOD</organizationName>
   <localityName>Durham/Oxford</localityName>
   <stateName>NC/Oxford</stateName>
   <country>USA/UK</country>
 </jarsigner>
 <jnlp spec="1.0+">
   <information>
     <title>Apollo</title>
     <vendor>GMOD Summer School 2009</vendor>
     <description>Apollo Webstart</description>
     <homepage href="http://localhost/apollo" />
     <icon href="images/head-of-apollo.gif" kind="shortcut" />
     <shortcut online="true">
       <desktop />
     </shortcut>
     <offline-allowed />
   </information>
   <security>
     <all-permissions />
   </security>
   <resources>
     <j2se version="1.5+" initial-heap-size="64m" max-heap-size="500m" />
   </resources>
   <application-desc main-class="apollo.main.Apollo">
       <argument>-i</argument>
       <argument>chadodb</argument>
       <argument>-l</argument>
       <argument>scf1117875581239:102000-110000</argument>
   </application-desc>
 </jnlp>
 <webserver>
   <url>http://localhost/apollo/webstart</url>
   <jar_location>jars</jar_location>
 </webserver>

</webstart> </xml>

The nice thing about this template is that you only need to set it up once (assuming you're not changing the URL or any other option).

The Apache web pages reside at /var/www. We'll create an Apollo directory. The directory is only writable by root, so we'll need to be root.

<bash> $ sudo -s $ cd /var/www $ mkdir -p apollo/webstart $ cd apollo/webstart </bash>

Let's create the file apollo_webstart.xml in the webstart directory we just created. Now we'll run the magical script.

<bash> $ ~/software/apollo/bin/webstart_generator.pl -i apollo_webstart.xml -d ~/software/apollo/jars -o apollo.jnlp -D jars </bash>

Voila, it was THAT easy. This script took care of signing all the jars and generating the appropriate jnlp file. Next time, when you make a change to one of the configurations, you'll just need to recompile the apollo.jar and re-run this script.

Lastly, we'll just create a simple web page to link to the Apollo WebStart instance.

<html>
  <body>
    <a href="apollo.jnlp">Launch Apollo!!!</a>
  </body>
</html>

One last note, you'll want to make sure that your web server has support for jnlp files. With our Apache install, you'll need to make sure that you have the following line in your /etc/mime.types file:

application/x-java-jnlp-file    jnlp

Writing Custom Data Adapters

There's a bit of work involved with writing a custom data adapter. It will require you to have knowledge of Java and the data model used in Apollo. We won't have enough time to write one ourselves, but we can briefly discuss how the process works.

All data adapters belong to the apollo.dataadapter package. To build our own custom data adapter, we need to implement two specific interfaces: ApolloDataAdapterI.java and org.bdgp.io.DataAdapterUI. There are abstract classes that implement common methods for both of those interfaces: AbstractApolloAdapter.java and org.bdgp.swing.AbstractDataAdapterUI respectively.

ApolloDataAdapterI does the work of parsing the input data. The most important method for the interface is getCurationSet() which does the work of returning the data to the logic layer.

ApolloDataAdapterGUI provides the GUI that we see in the Apollo: load data window. It implements the necessary interfaces and extends JPanel, so you can build your GUI directly in the class.

We'll take a look at how to get the sample adapter to work. The code for the sample adapter is located in apollo.dataadapter.sample package. Since we just finished building the jar, we have already compiled the code. So all we need to do is tell Apollo to load the data adapter plugin. We'll add the following to apollo.cfg:

DataAdapterInstall      "apollo.dataadapter.sample.SampleAdapter"   "gmod_summer_school.style"     "Sample data adapter"

The first column tells Apollo that we'll be loading a new data adapter. The second column points to the class for the data adapter. The third column tells which style to associate with this data. The fourth column provides the text that will be displayed in the data adapter pull down menu.

Now if we run Apollo, we'll see "Sample data adapter" as one of the options. Lovely!

There's a roughly written tutorial on how to create your own data adapter. It covers some information on the data adapter API and the data model. You'll want to read that if you're interested in creating one. It's located in $APOLLO_ROOT/doc/html/dataadapter_cookbook.html. You can also view the Apollo Javadoc API.