Difference between revisions of "BioMart Tutorial 2011"

From GMOD
Jump to: navigation, search
(2.2 Installing Biomart 0.8)
(1.2 Accounts that you will need for this tutorial)
Line 34: Line 34:
 
** username: biomart
 
** username: biomart
 
** pasword: biomart
 
** pasword: biomart
 +
 +
In order to free up some disk space you will need to drop some unneeded databases:
 +
mysql -ubiomart -pbiomart -e "drop database vega_mart_58_template"
 +
mysql -ubiomart -pbiomart -e "drop database homo_sapiens_vega_58_37c"
 +
mysql -ubiomart -pbiomart -e "drop database vega_mart_63"
  
 
= 2. Downloading & Installing BioMart =
 
= 2. Downloading & Installing BioMart =

Revision as of 02:00, 15 October 2011

{{#icon: Biomart250.png|BioMart|200|BioMart}}

{{#icon: 170px-October2011Logo.png|October 2011 - Toronto
2009 GMOD Summer School - Toronto, Canada}} BioMart Session

2011 GMOD Community Meeting &
Ontario Institute for Cancer Research
Toronto, Canada
14 October 2011
Junjun Zhang, Elena Rivkin and Anthony Cros

__NOTITLE__


This tutorial walks you through how to install and configure a local installation of BioMart.


Contents


1. Setting up the virtual machine

1.1 Import and start the VM

  • We have create a virtual machine (VM) image using Oracle's VirtualBox software tool. The image file is in OVF/OVA format, you should be able to set up the VM using VirtualBox or VMware player. At this time, you should have one of them installed already.
  • We will be passing around USB memory sticks that contain the image file, please copy it to your laptop.
  • Import and start the VM using VirtualBox:
    • Start VirtualBox
    • Import image with: (from the menu bar) File --> Import Appliance
    • Navigate to the .ova file you just copied from the USB key and follow on screen instructions
    • After imported, choose to start the VM by clicking the Start button

1.2 Accounts that you will need for this tutorial

  • Linux:
    • username: biomart
    • pasword: biomart
  • MySQL:
    • username: biomart
    • pasword: biomart

In order to free up some disk space you will need to drop some unneeded databases:

mysql -ubiomart -pbiomart -e "drop database vega_mart_58_template"
mysql -ubiomart -pbiomart -e "drop database homo_sapiens_vega_58_37c"
mysql -ubiomart -pbiomart -e "drop database vega_mart_63"

2. Downloading & Installing BioMart

Two components are necessary for this tutorial: MartBuilder from an older version of BioMart release (0.7), and MartConfigurator is from the most recent development code snapshot (0.8)

Installation of MartBuilder and MartConfigurator has already been done on the VM image under ~/biomart_0_7.template and ~/biomart_0_8.template, but we are going to do it yourself in the tutorial.

2.1 Installing Biomart 0.7 - MartJ package

MartJ contains applications necessary to create a Mart: MarBuilder and MartRunner.

Download & extract tarball content with:

 $ cd
 $ mkdir biomart_0_7
 $ cd biomart_0_7
 $ wget ftp://anonymous@ftp.ebi.ac.uk/pub/software/biomart/martj_current/martj-bin.tgz
 $ tar zxvf martj-bin.tgz  # creates "martj-0.7" directory
 $ cd martj-0.7

2.2 Installing Biomart 0.8

Checkout a specific revision of the current development code from SVN repository as below:

 $ cd
 $ mkdir biomart_0_8
 $ cd biomart_0_8
 $ svn co -r 10500 https://code.oicr.on.ca/svn/biomart/biomart-java/branches/oct_3_2011 martconfigurator
 $ cd martconfigurator
 $ ant # build project with ant; in the future, you may use: ant clean dist

3. Building mart

This section will show you how to create a mart database from a normal relational database using MartBuilder. We use a simplified VEGA database as a start.

3.1 Creating mart schema using MartBuilder

3.1.1 Start MartBuilder

 $ cd ~/biomart_0_7/martj-0.7
 $ bin/martbuilder.sh

MartBuilder should open (see screenshot below): Mbuilder07.png

3.1.2 Add a source schema

In the menu bar, choose Schema->Add to open the dialog to add a schema.

Fill in the fields with the follow values as shown in the screenshot below:

  1. Name: vega
  2. Database type: MySQL
  3. Using MyISAM: checked
  4. Host: localhost
  5. Port: 3306
  6. Database: mini_hsap_vega
  7. Schema: mini_hsap_vega
  8. Username: biomart
  9. Password: biomart

ignore the last 3 fields (used for partitioning which is not covered in this tutorial)

Mb add schema2.png

Click the Test button to ensure we can connect to the database. Click the Add button in order to proceed with the dataset description.

You should now see the source database shown in MartBuilder as below:

MbSourceSchema.png

3.1.3 Select main table(s) and generate mart schema

We are going to create a dataset based on the tables "gene" and "transcript" (as main and submain tables respectively, as described in presentation)

  1. Right-click on the "gene" table
  2. Click on Create dataset

the gene table should be highlighted already as we arrived on the current menu by clicking on it

  1. add the transcript table (standard Ctrl + click)
  2. Press the Create button

Mb create dataset.png

This shows the mart schema (reverse star) of the dataset that has been just create.

Mb dataset.png

3.2 Materializing the mart

Your dataset does not actually exist yet. In order to create it, you need to generate the SQL for it then run that SQL against your database.

BioMart offers a tool (MartRunner) that does that for you, using JDBC. The SQL used is as ANSI-compliant as possible, with some exceptions based on the RDBMS in use.

3.2.1 Preparing database and start MartRunner

  • We now going to transform the source data into target dataset, but before that, we have to create a target database:
 $ mysql -hlocalhost -P3306 -ubiomart -pbiomart -e "create database mini_hsap_vega_mart"
  • Start MartRunner using port 9876 with:
 $ cd ~/biomart_0_7/martj-0.7/
 $ bin/martrunner.sh 9876

MartBuilder will send the materialization SQL to MartRunner through that port (in this example, 9876), and MartRunner will execute the transformation SQL.

3.2.2 Starting schema transformation from MartBuilder

We go back to MartBuilder, and click on the Build Mart button to pop up the following dialog:

  1. Datasets: gene
  2. Schema partitions: ignore
  3. Target database: mini_hsap_vega_mart (the database we just manually created)
  4. Target schema: mini_hsap_vega_mart (same as database for MySQL)
  5. Send SQL to: MartRunner
  6. MartRunner host name: localhost
  7. MartRunner port number: 9876 (the one we just arbitrarily chose because it was free)
  8. Database server name: localhost
  9. Database server port number: 3306


Mb build mart dialog.png

Click on the Generate SQL button


Mb sql generation.png

Click on the "Start job" button

3.2.3 Monitoring MartRunner transformation progress

Mb job finished.png

Ensure everything is successful, ie, displayed in green font!

You now have a mart database created from a 3NF normalized source database, and it's ready for configuring using MartConfigurator.

4. Configuring a data portal to expose the mart using MartConfigurator

This section will show you how to use MartConfigurator to configure a data portal web server that exposes the created VEGA mart to end users for querying.

4.1 Start MartConfigurator

Start MartConfigurator with the following command:

 $ cd ~/biomart_0_8/martconfigurator
 $ dist/scripts/martconfigurator.sh

The panel on the left corresponds to data sources; in our case we will add the mart that has been just created: mini_hsap_vega_mart

The panel on the right corresponds to access points for those data sources that will be exposed to web users.

Mc empty.png

4.2 Add our own mart: mini_hsap_vega_mart

Click on the Add Mart button to add a datasource

Mc add mart button.png

  • Wizard step 1 of 4:
  1. source profile: vega (anything will do)
  2. source type: RDBMS Mart

Mc add source wizard 1.png

  • Wizard step 2 of 4:
  1. RDBMS: MySQL (keep MyISAM checked)
  2. Host: localhost
  3. Port: 3306
  4. User: biomart
  5. Password: biomart
  6. Database: can leave empty for now

Mc add source wizard 2.png

  • Wizard step 3 of 4:

Select mini_hsap_vega_mart, the mart that we just built using MartBuilder/MartRunner and based upon the mini_hsap_vega database

Mc add source wizard 4.png

  • Wizard step 4 of 4:
  1. Create naive configuration
  2. Choose the main table "gene__gene__main"

AddSourceWizard4.png

  • Now we should have our own mart added

4.3 Creating Access Point

Simply drag-and-drop the mart from the left side (Source panel) to anywhere on the right side (Portal panel).

It will add an access point to the mart. The default name is mini_hsap_vega_mart_ap, ap standing for Access Point, but you can give it a name of your choice.

Note that you can also create an access point by clicking the Add Access Point button. You would then be given a list of the existing marts to choose from.

Mc ready.png

4.4 Deploying the data portal, ie, the BioMart Server

To deploy the web based data portal, simply click on Start Server button on the top right corner.

  • When the Start Server button is click, if the current configuration (registry) has never been save, you will be prompted with a dialog for saving the current registry.
  • Save the registry under /home/biomart/biomart_0_8/martconfigurator/registry/
  • The data portal will be deployed on your local machine using port 9000 by default.
  • Your web browser should open and pointing to http://localhost:9000/ automatically when the server is ready.
  • Note the URL: localhost:9000/web (default)

Mc deployed server.png


  • To stop the server use the Stop Server button in the upper right corner

There is also a command-line approch to starting/stopping the server using:

 $ dist/scripts/biomart-server.sh start
 $ dist/scripts/biomart-server.sh stop

but that will not be covered here.

4.5 Exploring our first BioMart data portal

You can choose some attributes and hit "GO" button to get some result.

5. More MartConfigurator exercises

MartConfigurator is a desktop application for configuring a data portal (ie, a BioMart Server).

5.1 The MartConfigurator main window

As it's shown before, the main window is divided into left and right halves. Left for managing data sources and right for managing access points. Access point is how a data source is presented to the end user for querying from web GUI or an API client.

There are two main activities at the main window:

  • adding data sources
  • adding/configuring data access points

Configuring a source or an access point mainly involves creating Attributes and Filters, and organizing them into containers (and sub-containers)

Double-click a mart icon in the Source panel or an access point icon on the Portal panel will bring up the ConfigurationEditor window (next section for details).

5.2 ConfigurationEditor window

  • Left panel: Source Config
  • Right panel: Access Point
  • Both panels are similarly divided into top and bottom sections
    • top section showing the containers and their containing attributes or filters
    • bottom section showing the properties and its values for a selected configuration items: container, attribute or filter


ConfigurationEditor.png

  • Only properties whose name appears in blue font can be modified.
  • Each view has a Find search box that allows to quickly find attributes/filters based on their names
  • Also very convenient, the attributes and filters in each view offer an Show in the [opposite view] item in their context menu. It filters the opposite view side to show the counterpart filter/attribute
  • By drag-and-drop from the left to right, we can add new attributes/filters from a source to its access point

5.3 Rename default GUI tab to Form

  1. GUI tab (or GUI container) is used to organize access points, it usually corresponds to a box on the home page of the data portal.
  2. Right-click on the tab --> Rename
  3. enter new name: "Form"

5.4 Add new GUI tab

  1. Click on the "+" sign next to the latest GUI tab
  2. Enter name of the new tab: "Wizard"
  3. Right-click on the newly created tab --> Set GUI type --> MartWizard

5.5 Add full VEGA mart: an external URL-based mart source

  1. Add remote mart (URL Mart) Using backward compatibility (from previous BioMart versions: <= 0.7)
  2. In the Source view: Add Mart
  3. Select URL Mart --> Next
  4. Input the following values:
    • Protocol: http
    • Host: www.biomart.org
    • Port: 80
    • Path: /biomart/martservice
  5. Choose source: "vega (url)" --> Next
  6. Select all 3 datasets and click Next
  7. Uncheck import each datasets to individual marts, one dataset per mart --> Finish
    • Backward compatibility is run in the background in order to convert a mart configuration in 0.7 format to one in 0.8
  8. A datas ource called "gene_vega" should appear on the right-hand side panel (datasource panel)

5.6 Add Pathway dataset from REACTOME mart: another external URL-based mart source

  • Reproduce the same process in previous section excepted for step 5 where we choose "REACTOME (url)", then in step 6 we select pathway only

5.7 Add access points for the gene_vega URL mart

  1. Choose the "Form" GUI tab
  2. Drag-and-drop the "vega_gene" mart from the left to the portal panel
  3. Rename the new access point to "VEGA Genes in MartForm" by right-click the access point icon
  4. Choose the "Wizard" GUI tab
  5. Drag-and-drop the "vega_gene" mart from the left to the portal panel
  6. Rename the new access point to "VEGA Genes in MartWizard" by right-click the access point icon

5.8 Re-deploy server

  • Now re-deploy the server using "Start Server" button. If the server is running already, stop it first.
  • Check it out in web browser, how different GUI type offers different query interfaces

5.9 Change settings to attribute

  • Understand and fix the linkouturl setting for VEGA gene ID:

Linkouturl.png

  • To fix it, we need to change the pseudoattribute exturl value to: http://www.ensembl.org

EditPseudoAtt.png

5.10 Understanding filter types

  1. Double-click the 'gene_vega' mart icon to open ConfigurationEditor window
  2. Click the 'Show both' button on the top
  3. Find the follow filters by typing their name in the Find text box
    • singleSelect type: "Chromosome"
    • multiSelect: "Gene type"
  4. To edit the dropdown content of the above filter:
    • right-click the filter and choose Dropdown options

5.11 Importing attributes/filters from external mart

The task: generate a gene list by specifying the pathway name in which the genes are involved.

The solution: drag-and-drop 'pathway name' filter from 'pathway' to a 'gene_vega' access point

Here is how: CreatePointerFilter.png

Before that you will need to create a link between the two related mart: pathway and gene_vega. Here they are linked through the common Ensembl Gene ID LinkCreation.png

6. Querying a BioMart server via REST API

6.1 MetaData queries

http://localhost:9000/martservice/marts
http://localhost:9000/martservice/datasets?mart=gene_vega
http://localhost:9000/martservice/accesspoints?datasets=&mart=gene_vega
http://localhost:9000/martservice/attributes?dataset=hsapiens_gene_vega&mart=gene_vega
http://localhost:9000/martservice/filters?dataset=hsapiens_gene_vega&mart=gene_vega

6.2 Data query

<xml> <Query processor="TSV" header="true" limit="-1" client="webbrowser"> <Dataset name="hsapiens_gene_vega" config="gene_vega_ap"> <Filter name="chromosomal_region" value="2:1000000:2000000:1,4:9000000:11000000:-1"/> <Filter name="biotype" value="protein_coding"/> <Attribute name="vega_gene_id"/> <Attribute name="vega_transcript_id"/> <Attribute name="vega_translation_id"/> <Attribute name="chromosome_name"/> <Attribute name="start_position"/> <Attribute name="end_position"/> <Attribute name="strand"/> <Attribute name="band"/> </Dataset> </Query> </xml>

Paste this piece of XML in a web browse address as:

http://localhost:9000/martservice/results?query=paste_query_xml_string_here

or run the follow perl code:

#!/usr/bin/perl

# an example script demonstrating the use of BioMart webservice
use strict;
use LWP::UserAgent;

open (FH,$ARGV[0]) || die ("\nUsage: perl webExample.pl Query.xml\n\n");

my $xml;
while (<FH>){
    $xml .= $_;
}
close(FH);

my $outfile = $ARGV[1];
open (OUT, "> $outfile") if ($outfile);

my $path="http://localhost:9000/martservice/results?";
my $request = HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
my $ua = LWP::UserAgent->new;

my $response;

$ua->request($request, 
             sub{   
                 my($data, $response) = @_;
                 if ($response->is_success) {
                     $outfile ? print OUT "$data" : print "$data";
                 }
                 else {
                     warn ("Problems with the web server: ".$response->status_line);
                 }
             },1000);

close(OUT) if ($outfile);

on the command line as:

perl webQuery.pl query.xml

7. Further readings

To read more about BioMart, refer to the recent articles describing the BioMart software and its applications.

1. Zhang J, Haider S, Baran J, Cros A, Guberman JM, Hsu J, Liang Y, Yao L, Kasprzyk A. BioMart: a data federation framework for large collaborative projects. Database (Oxford). 2011 Sep 19:2011:bar038.

2.Guberman JM, Ai J, Baran J., et al. BioMart Central Portal: An Open Database Network for the Biological Community. Database (Oxford). 2011 Sep18;2011:bar041.

3. Zhang J, Baran J, Guberman JM, Haider, S, Hsu J, Liang Y, Rivkin E, Wang J, Whitty B, Wong-Erasmus M, Yao L, Kasprzyk A. International Cancer Genome Consortium Data Portal - a One-stop Shop for Cancer Genomics Data. Database (Oxford). 2011 Sep19;2011:bar026.