Difference between revisions of "BioMart Tutorial"

From GMOD
Jump to: navigation, search
(Completely replaced contents of this page with the contents of the BioMart Tutorial 2011 Page)
(content removed at the request of Arek)
Line 1: Line 1:
{| class="tutorialheader"
 
| align="right" | {{#icon: Biomart250.png|BioMart|200|BioMart}}<br /><br />{{#icon: 170px-October2011Logo.png|October 2011 - Toronto||2009 GMOD Summer School - Toronto, Canada}}
 
| {{TutorialTitleLine|[[BioMart]]}}<br />
 
[http://gmod.org/wiki/October_2011_GMOD_Meeting 2011 GMOD Community Meeting] & <br />[http://oicr.on.ca/ Ontario Institute for Cancer Research]<br />Toronto, Canada<br />
 
14 October 2011<br />
 
[[User:Junjun|Junjun Zhang]], [[User:Elena%20Rivkin|Elena Rivkin]] and [[User:Anthony%20Cros|Anthony Cros]]
 
|}
 
 
__NOTITLE__
 
__NOTITLE__
  
 
+
<center>{{#icon: Biomart250.png|BioMart||http://www.biomart.org/}}</center>
This tutorial walks you through how to install and configure a local installation of [[BioMart]].
+
 
+
 
+
 
+
= 1. Setting up the virtual machine =
+
 
+
== 1.1 Import and start the VM ==
+
* We have create a virtual machine (VM) image using Oracle's VirtualBox software tool. The image file is in OVF/OVA format, you should be able to set up the VM using VirtualBox or VMware player. At this time, you should have one of them installed already.
+
* We will be passing around USB memory sticks that contain the image file, please copy it to your laptop.
+
* Import and start the VM using VirtualBox:
+
** Start VirtualBox
+
** Import image with: (from the menu bar) File --> Import Appliance
+
** Navigate to the .ova file you just copied from the USB key and follow on screen instructions
+
** After imported, choose to start the VM by clicking the Start button
+
 
+
== 1.2 Accounts that you will need for this tutorial ==
+
* Linux:
+
** username: biomart
+
** pasword: biomart
+
 
+
* MySQL:
+
** username: biomart
+
** pasword: biomart
+
 
+
In order to free up some disk space you will need to drop some unneeded databases:
+
mysql -ubiomart -pbiomart -e "drop database vega_mart_58_template"
+
mysql -ubiomart -pbiomart -e "drop database homo_sapiens_vega_58_37c"
+
mysql -ubiomart -pbiomart -e "drop database vega_mart_63"
+
 
+
= 2. Downloading & Installing BioMart =
+
 
+
Two components are necessary for this tutorial: MartBuilder from an older version of BioMart release (0.7), and MartConfigurator is from the most recent development code snapshot (0.8)
+
 
+
Installation of MartBuilder and MartConfigurator has already been done on the VM image under ~/biomart_0_7.template and ~/biomart_0_8.template, but we are going to do it yourself in the tutorial.
+
 
+
== 2.1 Installing Biomart 0.7 - MartJ package ==
+
 
+
MartJ contains applications necessary to create a Mart: MarBuilder and MartRunner.
+
 
+
Download & extract tarball content with:
+
 
+
  $ cd
+
  $ mkdir biomart_0_7
+
  $ cd biomart_0_7
+
  $ wget <nowiki>ftp://anonymous@ftp.ebi.ac.uk/pub/software/biomart/martj_current/martj-bin.tgz</nowiki>
+
  $ tar zxvf martj-bin.tgz  # creates "martj-0.7" directory
+
  $ cd martj-0.7
+
 
+
== 2.2 Installing Biomart 0.8 ==
+
 
+
Checkout a specific revision of the current development code from SVN repository as below:
+
 
+
  $ cd
+
  $ mkdir biomart_0_8
+
  $ cd biomart_0_8
+
  $ svn co <nowiki>-r 10500 https://code.oicr.on.ca/svn/biomart/biomart-java/branches/oct_3_2011 martconfigurator</nowiki>
+
  $ cd martconfigurator
+
  $ ant # build project with ant; in the future, you may use: ant clean dist
+
 
+
= 3. Building mart =
+
 
+
This section will show you how to create a mart database from a normal relational database using MartBuilder. We use a simplified VEGA database as a start.
+
 
+
== 3.1 Creating mart schema using MartBuilder ==
+
 
+
=== 3.1.1 Start MartBuilder ===
+
 
+
 
+
  $ cd ~/biomart_0_7/martj-0.7
+
  $ bin/martbuilder.sh
+
 
+
MartBuilder should open (see screenshot below):
+
[[File:Mbuilder07.png]]
+
 
+
=== 3.1.2 Add a source schema ===
+
 
+
In the menu bar, choose ''Schema''->''Add'' to open the dialog to add a schema.
+
 
+
Fill in the fields with the follow values as shown in the screenshot below:
+
 
+
<div class="emphasisbox">
+
# Name: vega
+
# Database type: MySQL
+
# Using MyISAM: ''checked''
+
# Host: localhost
+
# Port: 3306
+
# Database: mini_hsap_vega
+
# Schema: mini_hsap_vega
+
# Username: biomart
+
# Password: biomart
+
''ignore the last 3 fields (used for partitioning which is not covered in this tutorial)''
+
 
+
[[File:Mb_add_schema2.png‎]]
+
</div>
+
 
+
Click the ''Test'' button to ensure we can connect to the database.
+
Click the ''Add'' button in order to proceed with the dataset description.
+
 
+
You should now see the source database shown in MartBuilder as below:
+
<div class="emphasisbox">
+
[[File:MbSourceSchema.png]]
+
</div>
+
 
+
=== 3.1.3 Select main table(s) and generate mart schema ===
+
 
+
We are going to create a dataset based on the tables "gene" and "transcript" (as ''main'' and ''submain'' tables respectively, as described in presentation)
+
 
+
<div class="emphasisbox">
+
# Right-click on the "gene" table
+
# Click on ''Create dataset''
+
''the gene table should be highlighted already as we arrived on the current menu by clicking on it''
+
# add the transcript table (standard Ctrl + click)
+
# Press the ''Create'' button
+
 
+
[[File:Mb__create_dataset.png]]
+
</div>
+
 
+
<div class="emphasisbox">
+
This shows the mart schema (reverse star) of the dataset that has been just create.
+
 
+
[[File:Mb__dataset.png]]
+
</div>
+
 
+
== 3.2 Materializing the mart ==
+
 
+
Your dataset does not actually exist yet. In order to create it, you need to generate the SQL for it then run that SQL against your database.
+
 
+
BioMart offers a tool ('''MartRunner) that does that for you, using JDBC. The SQL used is as ANSI-compliant as possible, with some exceptions based on the RDBMS in use.
+
 
+
=== 3.2.1 Preparing database and start MartRunner ===
+
 
+
* We now going to transform the source data into target dataset, but before that, we have to create a target database:
+
 
+
  $ mysql -hlocalhost -P3306 -ubiomart -pbiomart -e "create database mini_hsap_vega_mart"
+
 
+
* Start MartRunner using port 9876 with:
+
 
+
  $ cd ~/biomart_0_7/martj-0.7/
+
  $ bin/martrunner.sh 9876
+
 
+
MartBuilder will send the materialization SQL to MartRunner through that port (in this example, 9876), and MartRunner will execute the transformation SQL.
+
 
+
=== 3.2.2 Starting schema transformation from MartBuilder ===
+
 
+
We go back to MartBuilder, and click on the ''Build Mart'' button to pop up the following dialog:
+
 
+
<div class="emphasisbox">
+
 
+
# Datasets: gene
+
# Schema partitions: ''ignore''
+
# Target database: mini_hsap_vega_mart (the database we just manually created)
+
# Target schema: mini_hsap_vega_mart (same as database for MySQL)
+
# Send SQL to: MartRunner
+
# MartRunner host name: localhost
+
# MartRunner port number: 9876 (the one we just '''arbitrarily''' chose because it was free)
+
# Database server name: localhost
+
# Database server port number: 3306
+
 
+
 
+
[[File:Mb__build_mart_dialog.png]]
+
 
+
Click on the ''Generate SQL'' button
+
 
+
</div>
+
 
+
 
+
<div class="emphasisbox">
+
 
+
[[File:Mb__sql_generation.png]]
+
 
+
Click on the "Start job" button
+
</div>
+
 
+
=== 3.2.3 Monitoring MartRunner transformation progress ===
+
 
+
<div class="emphasisbox">
+
 
+
[[File:Mb__job_finished.png]]
+
 
+
Ensure everything is successful, ie, displayed in green font!
+
 
+
</div>
+
 
+
You now have a mart database created from a 3NF normalized source database, and it's ready for configuring using MartConfigurator.
+
 
+
= 4. Configuring a data portal to expose the mart using MartConfigurator =
+
 
+
<div class="emphasisbox">
+
 
+
This section will show you how to use MartConfigurator to configure a data portal web server that exposes the created VEGA mart to end users for querying.
+
 
+
</div>
+
 
+
== 4.1 Start MartConfigurator ==
+
 
+
Start MartConfigurator with the following command:
+
 
+
  $ cd ~/biomart_0_8/martconfigurator
+
  $ dist/scripts/martconfigurator.sh
+
 
+
<div class="emphasisbox">
+
The panel on the left corresponds to data sources; in our case we will add the mart that has been just created: ''mini_hsap_vega_mart''
+
 
+
The panel on the right corresponds to access points for those data sources that will be exposed to web users.
+
 
+
[[File:mc__empty.png]]
+
</div>
+
 
+
== 4.2 Add our own mart: mini_hsap_vega_mart ==
+
 
+
<div class="emphasisbox">
+
Click on the ''Add Mart'' button to add a datasource
+
 
+
[[File:Mc__add_mart__button.png]]
+
</div>
+
 
+
* Wizard step 1 of 4:
+
<div class="emphasisbox">
+
# source profile: vega (''anything will do'')
+
# source type: RDBMS Mart
+
 
+
[[File:Mc__add_source_wizard_1.png]]
+
</div>
+
 
+
* Wizard step 2 of 4:
+
<div class="emphasisbox">
+
# RDBMS: MySQL (keep MyISAM checked)
+
# Host: localhost
+
# Port: 3306
+
# User: biomart
+
# Password: biomart
+
# Database: ''can leave empty for now''
+
[[File:Mc__add_source_wizard_2.png]]
+
</div>
+
 
+
* Wizard step 3 of 4:
+
<div class="emphasisbox">
+
 
+
Select ''mini_hsap_vega_mart'', the mart that we just built using MartBuilder/MartRunner and based upon the ''mini_hsap_vega'' database
+
 
+
[[File:Mc__add_source_wizard_4.png]]
+
</div>
+
 
+
* Wizard step 4 of 4:
+
<div class="emphasisbox">
+
# Create naive configuration
+
# Choose the main table "gene__gene__main"
+
 
+
[[File:AddSourceWizard4.png]]
+
</div>
+
 
+
* Now we should have our own mart added
+
 
+
== 4.3 Creating Access Point ==
+
 
+
<div class="emphasisbox">
+
Simply drag-and-drop the mart from the left side (''Source'' panel) to anywhere on the right side (''Portal'' panel).
+
 
+
It will add an access point to the mart. The default name is ''mini_hsap_vega_mart_ap'', ''ap'' standing for ''Access Point'', but you can give it a name of your choice.
+
 
+
Note that you can also create an access point by clicking the ''Add Access Point'' button. You would then be given a list of the existing marts to choose from.
+
 
+
[[File:Mc__ready.png]]
+
</div>
+
 
+
== 4.4 Deploying the data portal, ie, the BioMart Server ==
+
 
+
To deploy the web based data portal, simply click on '''Start Server''' button on the top right corner.
+
 
+
* When the '''Start Server''' button is click, if the current configuration (registry) has never been save, you will be prompted with a dialog for saving the current ''registry''.
+
* Save the registry under /home/biomart/biomart_0_8/martconfigurator/registry/
+
* The data portal will be deployed on your local machine using port 9000 by default.
+
* Your web browser should open and pointing to http://localhost:9000/ automatically when the server is ready.
+
 
+
<div class="emphasisbox">
+
* Note the URL: localhost:9000/web (default)
+
[[File:Mc__deployed_server.png]]
+
</div>
+
 
+
 
+
* To stop the server use the ''Stop Server'' button in the upper right corner
+
 
+
There is also a command-line approch to starting/stopping the server using:
+
  $ dist/scripts/biomart-server.sh start
+
  $ dist/scripts/biomart-server.sh stop
+
but that will not be covered here.
+
 
+
== 4.5 Exploring our first BioMart data portal  ==
+
 
+
You can choose some attributes and hit "GO" button to get some result.
+
 
+
= 5. More MartConfigurator exercises =
+
 
+
MartConfigurator is a desktop application for configuring a data portal (ie, a BioMart Server).
+
 
+
== 5.1 The MartConfigurator main window ==
+
 
+
As it's shown before, the main window is divided into left and right halves. Left for managing data sources and right for managing access points. Access point is how a data source is presented to the end user for querying from web GUI or an API client.
+
 
+
There are two main activities at the main window:
+
* adding data sources
+
* adding/configuring data access points
+
 
+
Configuring a source or an access point mainly involves creating Attributes and Filters, and organizing them into containers (and sub-containers)
+
 
+
Double-click a mart icon in the Source panel or an access point icon on the Portal panel will bring up the ConfigurationEditor window (next section for details).
+
 
+
== 5.2 ConfigurationEditor window ==
+
 
+
* '''Left''' panel: Source Config
+
* '''Right''' panel: Access Point
+
* Both panels are similarly divided into top and bottom sections
+
** top section showing the containers and their containing attributes or filters
+
** bottom section showing the properties and its values for a selected configuration items: container, attribute or filter
+
 
+
 
+
<div class="emphasisbox">
+
[[File:ConfigurationEditor.png]]
+
 
+
* Only properties whose name appears in blue font can be modified.
+
 
+
* Each view has a ''Find'' search box that allows to quickly find attributes/filters based on their names
+
 
+
* Also very convenient, the attributes and filters in each view offer an ''Show in the [opposite view]'' item in their context menu. It filters the opposite view side to show the counterpart filter/attribute
+
 
+
* By drag-and-drop from the left to right, we can add new attributes/filters from a source to its access point
+
 
+
</div>
+
 
+
== 5.3 Rename '''default''' GUI tab to '''Form''' ==
+
# GUI tab (or GUI container) is used to organize access points, it usually corresponds to a box on the home page of the data portal.
+
# Right-click on the tab --> ''Rename''
+
# enter new name: "Form"
+
 
+
== 5.4 Add new GUI tab ==
+
# Click on the "+" sign next to the latest GUI tab
+
# Enter name of the new tab: "Wizard"
+
# Right-click on the newly created tab --> ''Set GUI type'' --> ''MartWizard''
+
 
+
== 5.5 Add full VEGA mart: an external URL-based mart source ==
+
 
+
# Add remote mart (URL Mart) Using backward compatibility (from previous BioMart versions: <= 0.7)
+
# In the ''Source'' view: ''Add Mart''
+
# Select ''URL Mart'' --> ''Next''
+
# Input the following values:
+
#* Protocol: http
+
#* Host: www.biomart.org
+
#* Port: 80
+
#* Path: /biomart/martservice
+
# Choose source: "vega (url)" --> ''Next''
+
# Select all 3 datasets and click ''Next''
+
# Uncheck ''import each datasets to individual marts, one dataset per mart'' --> ''Finish''
+
#* Backward compatibility is run in the background in order to convert a mart configuration in 0.7 format to one in 0.8
+
# A datas ource called "gene_vega" should appear on the right-hand side panel (datasource panel)
+
 
+
== 5.6 Add Pathway dataset from REACTOME mart: another external URL-based mart source ==
+
 
+
 
+
* Reproduce the same process in previous section excepted for step 5 where we choose "REACTOME (url)", then in step 6 we select '''pathway''' only
+
 
+
== 5.7 Add access points for the gene_vega URL mart ==
+
 
+
# Choose the "Form" GUI tab
+
# Drag-and-drop the "vega_gene" mart from the left to the portal panel
+
# Rename the new access point to "VEGA Genes in MartForm" by right-click the access point icon
+
# Choose the "Wizard" GUI tab
+
# Drag-and-drop the "vega_gene" mart from the left to the portal panel
+
# Rename the new access point to "VEGA Genes in MartWizard" by right-click the access point icon
+
 
+
== 5.8 Re-deploy server ==
+
 
+
* Now re-deploy the server using "Start Server" button. If the server is running already, stop it first.
+
* Check it out in web browser, how different GUI type offers different query interfaces
+
 
+
== 5.9 Change settings to attribute ==
+
 
+
<div class="emphasisbox">
+
* Understand and fix the linkouturl setting for VEGA gene ID:
+
[[File:Linkouturl.png]]
+
</div>
+
 
+
<div class="emphasisbox">
+
* To fix it, we need to change the pseudoattribute ''exturl'' value to: <nowiki>http://www.ensembl.org</nowiki>
+
[[File:EditPseudoAtt.png]]
+
</div>
+
 
+
== 5.10 Understanding filter types ==
+
 
+
# Double-click the 'gene_vega' mart icon to open ConfigurationEditor window
+
# Click the 'Show both' button on the top
+
# Find the follow filters by typing their name in the Find text box
+
#* singleSelect type: "Chromosome"
+
#* multiSelect: "Gene type"
+
# To edit the dropdown content of the above filter:
+
#* right-click the filter and choose ''Dropdown options''
+
 
+
== 5.11 Importing attributes/filters from external mart ==
+
 
+
<div class="emphasisbox">
+
The task: generate a gene list by specifying the pathway name in which the genes are involved.
+
</div>
+
 
+
<div class="emphasisbox">
+
The solution: drag-and-drop 'pathway name' filter from 'pathway' to a 'gene_vega' access point
+
</div>
+
 
+
'''Here is how:'''
+
[[File:CreatePointerFilter.png]]
+
 
+
'''Before that you will need to create a link between the two related mart: pathway and gene_vega. Here they are linked through the common Ensembl Gene ID'''
+
[[File:LinkCreation.png]]
+
 
+
 
+
'''Now save the changes, "Stop Server" then "Start Server" in MartConfigurator, go to the web browser and exam the query interface. You should see the "Miscellaneous - Pathway Name" filter under VEGA Gene. Try select a pathway name from the dropdown list, and select some attributes for the output genes, it should return the gene involved in the chosen pathway.'''
+
 
+
= 6. Querying a BioMart server via REST API =
+
 
+
== 6.1 MetaData queries ==
+
 
+
http://localhost:9000/martservice/marts
+
 
+
http://localhost:9000/martservice/datasets?mart=gene_vega
+
 
+
http://localhost:9000/martservice/accesspoints?datasets=&mart=gene_vega
+
 
+
http://localhost:9000/martservice/attributes?dataset=hsapiens_gene_vega&mart=gene_vega
+
 
+
http://localhost:9000/martservice/filters?dataset=hsapiens_gene_vega&mart=gene_vega
+
 
+
== 6.2 Data query ==
+
 
+
<xml>
+
<Query processor="TSV" header="true" limit="-1" client="webbrowser">
+
<Dataset name="hsapiens_gene_vega" config="gene_vega_ap">
+
<Filter name="chromosomal_region" value="2:1000000:2000000:1,4:9000000:11000000:-1"/>
+
<Filter name="biotype" value="protein_coding"/>
+
<Attribute name="vega_gene_id"/>
+
<Attribute name="vega_transcript_id"/>
+
<Attribute name="vega_translation_id"/>
+
<Attribute name="chromosome_name"/>
+
<Attribute name="start_position"/>
+
<Attribute name="end_position"/>
+
<Attribute name="strand"/>
+
<Attribute name="band"/>
+
</Dataset>
+
</Query>
+
</xml>
+
 
+
Paste this piece of XML in a web browse address as:
+
 
+
http://localhost:9000/martservice/results?query=paste_query_xml_string_here
+
 
+
or run the follow perl code:
+
 
+
<pre>
+
#!/usr/bin/perl
+
 
+
# an example script demonstrating the use of BioMart webservice
+
use strict;
+
use LWP::UserAgent;
+
 
+
open (FH,$ARGV[0]) || die ("\nUsage: perl webExample.pl Query.xml\n\n");
+
 
+
my $xml;
+
while (<FH>){
+
    $xml .= $_;
+
}
+
close(FH);
+
 
+
my $outfile = $ARGV[1];
+
open (OUT, "> $outfile") if ($outfile);
+
 
+
my $path="http://localhost:9000/martservice/results?";
+
my $request = HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
+
my $ua = LWP::UserAgent->new;
+
 
+
my $response;
+
 
+
$ua->request($request,
+
            sub{
+
                my($data, $response) = @_;
+
                if ($response->is_success) {
+
                    $outfile ? print OUT "$data" : print "$data";
+
                }
+
                else {
+
                    warn ("Problems with the web server: ".$response->status_line);
+
                }
+
            },1000);
+
 
+
close(OUT) if ($outfile);
+
</pre>
+
 
+
on the command line as:
+
 
+
perl webQuery.pl query.xml
+
 
+
= 7. Further readings =
+
 
+
To read more about BioMart, refer to the recent articles describing the BioMart software and its applications.
+
 
+
1. Zhang J, Haider S, Baran J, Cros A, Guberman JM, Hsu J, Liang Y, Yao L, Kasprzyk A. BioMart: a data federation framework for large collaborative projects. Database (Oxford). 2011 Sep 19:2011:bar038.
+
 
+
2.Guberman JM, Ai J, Baran J., et al. BioMart Central Portal: An Open Database Network for the Biological Community. Database (Oxford). 2011 Sep18;2011:bar041.
+
 
+
3. Zhang J, Baran J, Guberman JM, Haider, S, Hsu J, Liang Y, Rivkin E, Wang J, Whitty B, Wong-Erasmus M, Yao L, Kasprzyk A. International Cancer Genome Consortium Data Portal - a One-stop Shop for Cancer Genomics Data. Database (Oxford). 2011 Sep19;2011:bar026.
+
  
 
[[Category:BioMart]]
 
[[Category:BioMart]]
 
[[Category:Tutorials]]
 
[[Category:Tutorials]]

Revision as of 15:11, 22 June 2012

__NOTITLE__

{{#icon: Biomart250.png|BioMart||http://www.biomart.org/}}