Debian Stable Installation Notes

Debian Stable (Etch 4.0) installation notes

Installing Chado/GMOD old office machine with a fresh Debian stable installation. Goal: test the Chado scheme for annotation of bacterial genomes

AMD Athlon(tm) XP 3200+
1 GB memory
500 GB SATA disk

I borrowed a lot of stuff from the Installing_Chado_on_Ubuntu_HOWTO page as Debian and Ubuntu are much alike.

Apache

Apache 2.2 is installed by default during the installation of Debian stable (web package).

location for GMOD

mkdir /usr/local/gmod

note: this is NOT the home of my gmod user.

download

got the source from sourceforge and unpacked in /usr/local/gmod (maybe I should use the cvs version??)

=== postgresql setup ===

Note: Debian stable has postgresql 7.4 as default package, but i'm going for 8.1.11 as this is also the version that runs on our RedHat Enterprise 5 server (more on that later)

apt-get install the following packages, Synaptic is your friend

postgresql-8.1
postgresql-8.1-client-common
postgresql-client-8.1
postgresql-contrib-8.1
postgresql-plperl-8.1

Not necessary, but I also installed phpPgAdmin from source, not the Debian package.

Following the guidelines from INSTALL.Chado to add a database user and optimize some settings. Don't forget to create a database user root as well for the CPAN installations below, otherwise some tests fail. I added user 'chado' for the chado database installation.

The configuration files are located in a different location as opposed to our Redhat installation:

/etc/postgresql/8.1/main

I opted to install the CPAN modules instead of the Debian packages as some of the Debian packages have older versions of the modules. The modules are installed in

/usr/local/share/perl/5.8.8/

So not overwriting any installed Debian packages.

CPAN packages

CGI
GD
DBI
DBD::Pg
SQL::Translator
Digest::MD5
Text::Shellwords
Graph
Data::Stag
XML::Parser::PerlSAX
Module::Build
Class::DBI
Class::DBI::Pg
Class::DBI::Pager
DBIx::DBStag (Force install needed)
XML::Simple
LWP
Template
Log::Log4perl
GO::Parser

BioPerl

I installed BioPerl-live using subversion into /usr/local/bioperl/bioperl-live (as root)

svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live

cd /usrlocal/bioperl/bioperl-live
perl Build.PL
./Build
./Build test   #some tests fail as usual
sudo ./Build install

Environment Variables

Added this to my .bashrc so that when I login as user gmod all vars are set

# GMOD additions
export GMOD_ROOT=/usr/local/gmod
export GO_ROOT=/usr/local/share/perl/5.8.8/GO/
export PERL5LIB=$PERL5LIB:/usr/local/bioperl/bioperl-live
export CHADO_DB_NAME=dev_chado_01
export CHADO_DB_USERNAME=chado
export CHADO_DB_PASSWORD=******
#CHADO_DB_HOST=localhost
#CHADO_DB_PORT=5432

=== installing Chado scheme and tools

in /usr/local/gmod
perl Makefile.PL

make
sudo make install
make load_schema
make prepdb

a quick look at the database user phpPgAdmin: the database looks fine.

make ontologies
 ./Build ontologies
 Available ontologies:
 [1] Relationship Ontology
 [2] Sequence Ontology
 [3] Gene Ontology
 [4] Chado Feature Properties
 [5] Cell Ontology
 [6] Plant Ontology

 Which ontologies would you like to load (Comma delimited)? [0] 1,2,3,4

This take a while. i'm getting the notorious duplicate key violates unique constraint "cvterm_c1" error as well.

Data loading

NCBI gff3 files (does not work!)

We have a local mirror of [1] mounted at /data/genomes Tried the NCBI provided GFF file following Load_GFF_Into_Chado

inserted an organism in the database, our pet organism of course

INSERT INTO organism (abbreviation, genus, species, common_name)
               VALUES ('L.plantarum', 'Lactobacillus', 'plantarum', 'Lactobacillus plantarum WCFS1');

question: other pages seem to specify the taxonomy id as organism id. In our case that would be 220668. Should we use this ID or just let postgresql add an ID for us?

> gmod_gff3_preprocessor.pl -gfffile /data/genomes/Lactobacillus_plantarum/NC_004567.gff -outfile NC_004567.gff

fails with

"my" variable %seen masks earlier declaration in same scope at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 4199.
"my" variable %seen masks earlier declaration in same scope at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 4223.
NOTICE:  CREATE TABLE will create implicit sequence "gff_sort_tmp_row_id_seq" for serial column "gff_sort_tmp.row_id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "gff_sort_tmp_pkey" for table "gff_sort_tmp"
couldn't open /data/genomes/Lactobacillus_plantarum/NC_004567.gff.sorted.fasta for writing: Read-only file system

although running from a writable directory gmod_gff3_preprocessor.pl tries to write to the location where the input file is.

> cp /data/genomes/Lactobacillus_plantarum/NC_004567.gff NC_004567.gff
> gmod_gff3_preprocessor.pl --gfffile NC_004567.gff --outfile NC_004567.gff.sorted

"my" variable %seen masks earlier declaration in same scope at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 4199.
"my" variable %seen masks earlier declaration in same scope at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 4223.
Sorting the contents of NC_004567.gff ...
Writing sorted contents to NC_004567.gff.sorted

>gmod_bulk_load_gff3.pl -organism 'Lactobacillus plantarum WCFS1' --gfffile NC_004567.gff.sorted

results in lots of errors

(Re)creating the uniquename cache in the database... 
Creating table...
Populating table...
Creating indexes...Done.
Preparing data for inserting into the dev_chado_01 database
(This may take a while ...)

There is a CDS feature with no parent (ID:)  I think that is wrong!

This GFF file has CDS and/or UTR features that do not belong to a
'central dogma' gene (ie, gene/transcript/CDS).  The features of
this type are being stored in the database as is.

Unable to find srcfeature NC_004567.1 in the database.
Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 4026
Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x8bba494)', 'Bio::SeqFeature::Annotated=HASH(0x8c4be6c)') called
at /usr/local/bin/gmod_bulk_load_gff3.pl line 758
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().

Loading NCBI gff files directly is not going to work.

NCBI genbank RefSeq files (does work)

Please note that the page at Load_RefSeq_Into_Chado is now including an organism_id, confusing.

taken from RefSeq howto --> does not load in the database

 bp_genbank2gff3.pl --summary --typesource chromosome -noCDS --filter exon -o . NC_004567.gbk

taken from GenBank howto --> does load in the database

bp_genbank2gff3.pl -noCDS -s -o . NC_004567.gbk

Gives some output

# Input: NC_004567.gbk
# working on region:NC_004567, Lactobacillus plantarum WCFS1, 03-DEC-2007, Lactobacillus plantarum WCFS1, complete genome.
# GFF3 saved to ./NC_004567.gbk.gff
# Summary:
# Feature       Count
# -------       -----
# mRNA  3007
# gene  3093
# region  1
# pseudogene  42
# CDS  3007
# RESIDUES(tr)  910927
# RESIDUES  3308274
# rRNA  16
# pseudogenic_region  42
# exon  3093
# tRNA  70

Load into the database with:

gmod_bulk_load_gff3.pl --organism 'Lactobacillus plantarum WCFS1'  --dbxref GeneID --gfffile NC_004567.gbk.gff --recreate_cache

filter out the exons

 cat NC_004567.gbk.gff | grep -v exon > NC_004567.gbk.gff.noexon

load the file into the database:

Debian Stable Installation Notes

Contents

Debian Stable (Etch 4.0) installation notes

Apache

location for GMOD

download

CPAN packages

BioPerl

Environment Variables

Data loading

NCBI gff3 files (does not work!)

NCBI genbank RefSeq files (does work)

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Documentation

Community

Tools