XORT Dumper HOWTO

From GMOD
Revision as of 21:02, 2 June 2008 by Jogoodma (Talk | contribs)

Jump to: navigation, search

Introduction

This HOWTO describes a method for dumping data from Chado into Chado XML using XORT.

Authors

Prerequisites

The steps outlined in this HOWTO were done using Ubuntu 8.0.4 desktop edition. If is assumed that you have installed the following packages beforehand.

  • PostgreSQL 8.2
  • Perl

PostgreSQL 8.3 is the default in 8.04 and should work, but at the time of this writing there were some concerns about GMOD compatibility with 8.3.

System Setup

1. Install make, gcc, autoconf, automake, and binutils.

$ sudo apt-get install make gcc autoconf automake binutils

2. Install Perl libraries

$ sudo apt-get install libxml-perl libxml-dom-perl libxml-sax-perl
$ sudo apt-get install libdbi-perl libdbd-pg-perl

Chado Database Setup

1. Create a PostgreSQL database user

$ sudo su - postgres
$ createuser

Then follow the prompts to create a user that has permission to create databases. The username used should match your local unix login name.

2. Set the new database user's password. You still need to be su'd as the postgres user here. Be sure to substitute <username> and <password> with an actual username and password.

$ psql
postgres=# alter user <username> password '<password>';
postgres=# \q
$ exit

3. Fetch the YFGdb Chado dump

$ wget ftp://gen-ftp.princeton.edu/yfgdb/yfgdb_no-privs_20080211.sql.gz

4. Create and load the YFGdb database

$ createdb -E SQL_ASCII yfgdb
$ zcat yfgdb_no-privs_20080211.sql.gz | psql -d yfgdb -o yfgdb_load.log

XORT Installation

$ wget http://superb-west.dl.sourceforge.net/sourceforge/gmod/XML-XORT-0.008.tar.gz
$ tar zxf XML-XORT-0.008.tar.gz
$ cd XML-XORT-0.008
$ perl Makefile.PL
  What is the database name? yfgdb
  What is the database username? <username>
  What is the password for 'jogoodma'? <password>
  What is the database host? localhost
  What is your database port? 5432
  Where will the tmp directory go? /tmp
  Where will the conf directory go? /usr/local/xort/conf
  Where is the DDL file? <enter> - accept default
  Where do you want to install XORT if other than default, press ENTER if default: <enter>
$ make
$ sudo make install

Dumping ChadoXML

Simple Genes example

If everything has gone well thus far you should have a functioning Chado installation with yeast data and XORT ready to go. For the first step of this HOWTO we are going to dump a simple listing (in ChadoXML format) of all the genes in the database. First create an XML file called genes.xml in your home directory that looks like this:

 <?xml version="1.0" encoding="UTF-8"?>
 <chado dumpspec="genes.xml">
     <feature dump="select">
        <type_id test="yes">
           <cvterm>
              <name>gene</name>
           </cvterm>
        </type_id>
        <is_obsolete test="yes">false</is_obsolete>
        <is_analysis test="yes">false</is_analysis>
        <uniquename />
        <name />
        <seqlen />
     </feature>
 </chado>

This simple dumpspec selects all features of type 'gene', that have is_obsolete set to false, and also have is_analysis set to false. It then dumps the uniquename, name, and seqlen fields from the feature table for these records. The dump="select" attribute of the feature tag tells XORT that we only want to dump what we explicitly define in the dumpspec. The "test" attribute informs XORT that the value(s) of this node or node tree in the XML is to be used to restrict the rows returned by XORT. Here we are using a simple exact value match test but you can also use 'like' or regular expression comparison operators. Please refer to the XORT documentation for a full list of supported search operators.

To execute it you would run the xort_dumper.pl like this:

$ /usr/local/bin/xort_dump.pl --database chado --output chado_genes.xml --dumpspec genes.xml

It is important to note that the --database option takes the first part of the database properties file name located in the XORT conf directory (/usr/local/xort/conf/chado.properties).

Synonyms

 <?xml version="1.0" encoding="UTF-8"?>
 <chado dumpspec="genes.xml">
     <feature dump="select">
        <type_id test="yes">
           <cvterm>
              <name>gene</name>
           </cvterm>
        </type_id>
        <is_obsolete test="yes">false</is_obsolete>
        <is_analysis test="yes">false</is_analysis>
        <uniquename />
        <name />
        <seqlen />
        <feature_synonym dump="select">
           <synonym_id>
              <synonym dump="select">
                 <name />
                 <type_id>
                    <cvterm dump="select">
                       <name />
                    </cvterm>
              </synonym>
           </synonym_id>
        </feature_synonym>
     </feature>
 </chado>

CDS with location data

More Information

See the XORT page.

Please send questions to the GMOD developers list:

gmod-devel@lists.sourceforge.net

Or contact the GMOD Help Desk