XORT Dumper HOWTO
Contents
Introduction
This HOWTO describes a method for dumping data from Chado into Chado XML using XORT.
Authors
Prerequisites
The steps outlined in this HOWTO were done using Ubuntu 8.0.4 desktop edition. If is assumed that you have installed the following packages beforehand.
- PostgreSQL 8.2
- Perl
PostgreSQL 8.3 is the default in 8.04 and should work, but at the time of this writing there were some concerns about GMOD compatibility with 8.3.
System Setup
1. Install make, gcc, autoconf, automake, and binutils.
$ sudo apt-get install make gcc autoconf automake binutils
2. Install Perl libraries
$ sudo apt-get install libxml-perl libxml-dom-perl libxml-sax-perl $ sudo apt-get install libdbi-perl libdbd-pg-perl
Chado Database Setup
1. Create a PostgreSQL database user
$ sudo su - postgres $ createuser
Then follow the prompts to create a user that has permission to create databases. The username used should match your local unix login name.
2. Set the new database user's password. You still need to be su'd as the postgres user here. Be sure to substitute <username> and <password> with an actual username and password.
$ psql postgres=# alter user <username> password '<password>'; postgres=# \q $ exit
3. Fetch the YFGdb Chado dump
$ wget ftp://gen-ftp.princeton.edu/yfgdb/yfgdb_no-privs_20080211.sql.gz
4. Create and load the YFGdb database
$ createdb -E SQL_ASCII yfgdb $ zcat yfgdb_no-privs_20080211.sql.gz | psql -d yfgdb -o yfgdb_load.log
XORT Installation
$ wget http://superb-west.dl.sourceforge.net/sourceforge/gmod/XML-XORT-0.008.tar.gz $ tar zxf XML-XORT-0.008.tar.gz $ cd XML-XORT-0.008 $ perl Makefile.PL What is the database name? yfgdb What is the database username? <username> What is the password for 'jogoodma'? <password> What is the database host? localhost What is your database port? 5432 Where will the tmp directory go? /tmp Where will the conf directory go? /usr/local/xort/conf Where is the DDL file? <enter> - accept default Where do you want to install XORT if other than default, press ENTER if default: <enter> $ make $ sudo make install
Dumping ChadoXML
Simple Genes example
If everything has gone well thus far you should have a functioning Chado installation with yeast data and XORT ready to go. For the first step of this HOWTO we are going to dump a simple listing (in ChadoXML format) of all the genes in the database. First create an XML file called genes.xml in your home directory that looks like this:
<?xml version="1.0" encoding="UTF-8"?> <chado dumpspec="genes.xml"> <feature dump="select"> <type_id test="yes"> <cvterm> <name>gene</name> </cvterm> </type_id> <is_obsolete test="yes">false</is_obsolete> <is_analysis test="yes">false</is_analysis> <uniquename /> <name /> <seqlen /> </feature> </chado>
This simple dumpspec selects all features of type 'gene', that have is_obsolete set to false, and also have is_analysis set to false. It then dumps the uniquename, name, and seqlen fields from the feature table for these records. The dump="select" attribute of the feature tag tells XORT that we only want to dump what we explicitly define in the dumpspec. The "test" attribute informs XORT that the value(s) of this node or node tree in the XML is to be used to restrict the rows returned by XORT. Here we are using a simple exact value match test but you can also use 'like' or regular expression comparison operators. Please refer to the XORT documentation for a full list of supported search operators.
To execute it you would run the xort_dumper.pl like this:
$ /usr/local/bin/xort_dump.pl --database chado --output chado_genes.xml --dumpspec genes.xml
It is important to note that the --database option takes the first part of the database properties file name located in the XORT conf directory (/usr/local/xort/conf/chado.properties).
Synonyms
<?xml version="1.0" encoding="UTF-8"?> <chado dumpspec="genes.xml"> <feature dump="select"> <type_id test="yes"> <cvterm> <name>gene</name> </cvterm> </type_id> <is_obsolete test="yes">false</is_obsolete> <is_analysis test="yes">false</is_analysis> <uniquename /> <name /> <seqlen /> <feature_synonym dump="select"> <synonym_id> <synonym dump="select"> <name /> <type_id> <cvterm dump="select"> <name /> </cvterm> </synonym> </synonym_id> </feature_synonym> </feature> </chado>
CDS with location data
More Information
See the XORT page.
Please send questions to the GMOD developers list:
gmod-devel@lists.sourceforge.net
Or contact the GMOD Help Desk