Difference between revisions of "Artemis-Chado Integration Tutorial"

From GMOD
Jump to: navigation, search
m
m (Text replace - "</sql>" to "</syntaxhighlight>")
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{UnderConstruction}}
+
{{ TutorialHeader
 +
| who = [[User:RobinHouston|Robin Houston]], [[User:TimCarver|Tim Carver]] and [[User:Buggy|Giles Velarde]]
 +
| what = Artemis
 +
| where = 2009 GMOD Summer School - Europe
 +
| when = August 2009
 +
| title = Artemis-Chado Integration
 +
| logo = Artemis_logo.gif
 +
}}
  
{| class="tutorialheader"
+
This [[:Category:Tutorials|tutorial]] walks you through how to use the Artemis annotation editor with a [[Chado]] database.
| align="right" | {{#icon: Artemis_logo.gif|Artemis||Artemis}}<br /><br />{{#icon: GMOD2009Europe170.png|2009 GMOD Summer School - Europe||2009 GMOD Summer School - Europe}}
+
| {{TutorialTitleLine|[[Artemis]] - [[Chado]] Integration}}<br />
+
[[2009 GMOD Summer School - Europe]]<br />
+
6 August 2009<br />
+
[[User:RobinHouston|Robin Houston]], [[User:TimCarver|Tim Carver]] and [[User:Buggy|Giles Velarde]]
+
|}
+
__NOTITLE__
+
  
 +
== VMware ==
  
This [[:Category:Tutorials|tutorial]] walks you through how to use the Artemis annotation editor with a [[Chado]] database. This tutorial was originally taught by [[User:RobinHouston|Robin Houston]], [[User:TimCarver|Tim Carver]] and [[User:Buggy|Giles Velarde]] at [[2009 GMOD Summer School - Europe]].
+
*[ftp://ftp.gmod.org/pub/gmod/Courses/2009/SummerSchoolEurope/GmodSumSch2009EU4.tar.gz Starting Image]
 +
*[ftp://ftp.gmod.org/pub/gmod/Courses/2009/SummerSchoolEurope/GmodSumSch2009EU5.tar.gz Ending Image]
  
 +
*Username: gmod
 +
*Password: gmod
  
__TOC__
+
== Caveats ==
 
+
 
+
= VMware =
+
{|
+
| valign="top" |This tutorial was taught using a [[VMware]] system image as a starting point.  If you want to start with that same system, download and install the ''Starting'' image.
+
 
+
'''''See [[VMware]] for what software you need to use a VMware system image, and for directions on how to get the image setup and running on your machine.'''''
+
|
+
{| style="margin-left: 1em; margin-top: 0; " class="wikitable"
+
! Download
+
|-
+
| align="center" | [ftp://ftp.gmod.org/pub/gmod/Courses/2009/SummerSchoolEurope/GmodSumSch2009EU4.tar.gz Starting&nbsp;Image]<br>
+
[ftp://ftp.gmod.org/pub/gmod/Courses/2009/SummerSchoolEurope/GmodSumSch2009EU5.tar.gz Ending Image]<br />
+
----
+
Username:&nbsp;gmod<br />Password: gmod
+
|}
+
|}
+
 
+
= Caveats =
+
  
 
{{TutorialCaveats}}
 
{{TutorialCaveats}}
Line 39: Line 24:
 
__TOC__
 
__TOC__
  
=Overview=
+
==Overview==
  
 
In this [[:Category:Tutorials|tutorial]] we present how to install and configure [http://www.sanger.ac.uk/Software/Artemis/ Artemis] and [http://www.sanger.ac.uk/Software/ACT/ ACT] to use with a [[Chado]] database. The first two sections relate to installing [[Postgres]] and Chado, this is included for completeness only and you should refer to the [[Chado Tutorial|Chado session]] for more details on this.
 
In this [[:Category:Tutorials|tutorial]] we present how to install and configure [http://www.sanger.ac.uk/Software/Artemis/ Artemis] and [http://www.sanger.ac.uk/Software/ACT/ ACT] to use with a [[Chado]] database. The first two sections relate to installing [[Postgres]] and Chado, this is included for completeness only and you should refer to the [[Chado Tutorial|Chado session]] for more details on this.
Line 45: Line 30:
 
Artemis is a DNA sequence browser which works with flat files (''e.g.'' EMBL, GenBank, [[GFF]]) and more recently with Chado databases. ACT (Artemis Comparison Tool) is based on Artemis. ACT uses BLAST comparison files to highlight regions of interest between pairs of sequences. Artemis and ACT in database mode are increasingly being used in the Pathogen Genomics Group at the Sanger Institute.
 
Artemis is a DNA sequence browser which works with flat files (''e.g.'' EMBL, GenBank, [[GFF]]) and more recently with Chado databases. ACT (Artemis Comparison Tool) is based on Artemis. ACT uses BLAST comparison files to highlight regions of interest between pairs of sequences. Artemis and ACT in database mode are increasingly being used in the Pathogen Genomics Group at the Sanger Institute.
  
= Download and Install [[Postgres]] =
+
== Download and Install [[Postgres]] ==
  
 
  ./configure --prefix=/home/gmod/gmod_test/pgsl --with-pgport=5432 --with-includes=/Developer
 
  ./configure --prefix=/home/gmod/gmod_test/pgsl --with-pgport=5432 --with-includes=/Developer
Line 54: Line 39:
 
  bin/initdb -D data/
 
  bin/initdb -D data/
  
Added the line to data/postgresql.conf:
+
Added the line to <tt>data/postgresql.conf</tt>:
  
 
  listen_addresses = 'localhost'
 
  listen_addresses = 'localhost'
Line 65: Line 50:
 
  createdb --port=5432 chado_pathogen
 
  createdb --port=5432 chado_pathogen
  
=Download and Install Chado=
+
==Download and Install Chado==
  
 
* Download stable release (gmod-1.0.tar.gz)
 
* Download stable release (gmod-1.0.tar.gz)
Line 90: Line 75:
 
  make ontologies
 
  make ontologies
  
=Examples of Loading Sequences into the Database=
+
==Examples of Loading Sequences into the Database==
  
 
In this section we detail how to load 3 ''Plasmodium'' sequences into [[Chado]] for viewing in Artemis and ACT. Alternatively you can use your own sequences of interest.
 
In this section we detail how to load 3 ''Plasmodium'' sequences into [[Chado]] for viewing in Artemis and ACT. Alternatively you can use your own sequences of interest.
Line 104: Line 89:
 
   psql chado_pathogen
 
   psql chado_pathogen
  
<sql>  INSERT INTO organism
+
<syntaxhighlight lang="sql">  INSERT INTO organism
 
     ( abbreviation, genus, species, common_name )
 
     ( abbreviation, genus, species, common_name )
 
   VALUES
 
   VALUES
 
     ( 'Pfalciparum', 'Plasmodium', 'falciparum', 'Pfalciparum'),
 
     ( 'Pfalciparum', 'Plasmodium', 'falciparum', 'Pfalciparum'),
     ( 'Pknowlesi', 'Plasmodium', 'knowlesi', 'Pknowlesi');</sql>
+
     ( 'Pknowlesi', 'Plasmodium', 'knowlesi', 'Pknowlesi');</syntaxhighlight>
  
 
Using the perl script <tt>bp_genbank2gff3.pl</tt> to convert the GenBank files to [[GFF3]] format:
 
Using the perl script <tt>bp_genbank2gff3.pl</tt> to convert the GenBank files to [[GFF3]] format:
Line 129: Line 114:
 
     -dbuser gmod -dbport 5432 -dbpass dd -recreate_cache < NC_011909.gbk.gff
 
     -dbuser gmod -dbport 5432 -dbpass dd -recreate_cache < NC_011909.gbk.gff
  
=Download Artemis and ACT=
+
==Download Artemis and ACT==
  
 
You can download [http://www.sanger.ac.uk/Software/Artemis/ Artemis] and [http://www.sanger.ac.uk/Software/ACT/ ACT] from their home pages at the Sanger Institute. For the most up-to-date developments download the software from the [[Glossary#CVS|CVS]] server:
 
You can download [http://www.sanger.ac.uk/Software/Artemis/ Artemis] and [http://www.sanger.ac.uk/Software/ACT/ ACT] from their home pages at the Sanger Institute. For the most up-to-date developments download the software from the [[Glossary#CVS|CVS]] server:
Line 142: Line 127:
 
Or download the development version from the [http://www.sanger.ac.uk/Software/Artemis/#development Development section] on the Artemis home page. Note that on the Artemis web site there is also a [http://www.sanger.ac.uk/Software/Artemis/stable/ stable] release available.
 
Or download the development version from the [http://www.sanger.ac.uk/Software/Artemis/#development Development section] on the Artemis home page. Note that on the Artemis web site there is also a [http://www.sanger.ac.uk/Software/Artemis/stable/ stable] release available.
  
=Running Artemis=
+
==Running Artemis==
  
 
Try running the <tt>art</tt> script in the download:
 
Try running the <tt>art</tt> script in the download:
Line 151: Line 136:
  
  
[[Image:ArtemisLogin.gif]]
+
[[File:ArtemisLogin.gif]]
  
  
Line 157: Line 142:
  
  
[[Image:DatabaseManager.gif]]
+
[[File:DatabaseManager.gif]]
  
  
Line 163: Line 148:
  
  
[[Image:Artemis.gif]]
+
[[File:Artemis.gif]]
  
  
Line 182: Line 167:
  
  
[[Image:GeneBuilder.gif|GeneBuilder]]
+
[[File:GeneBuilder.gif|GeneBuilder]]
  
  
Line 195: Line 180:
 
Note using the JVM option 'show_log' will open the log window.
 
Note using the JVM option 'show_log' will open the log window.
  
=Configuration Options=
+
==Configuration Options==
  
 
Edit <tt>etc/options</tt> (to change settings globally) or create a file <tt>~/.artemis_options</tt> in your home directory (for your own settings). There are various flags that can be used to configure Artemis and ACT with [[Chado]].
 
Edit <tt>etc/options</tt> (to change settings globally) or create a file <tt>~/.artemis_options</tt> in your home directory (for your own settings). There are various flags that can be used to configure Artemis and ACT with [[Chado]].
Line 240: Line 225:
 
'''sequence_update_features''' This lists the features that Artemis will maintain the feature.residue column for. This is generally useful for polypeptide and transcript features.
 
'''sequence_update_features''' This lists the features that Artemis will maintain the feature.residue column for. This is generally useful for polypeptide and transcript features.
  
=Artemis Database Manager=
+
==Artemis Database Manager==
  
 
The database manager provides the list of organisms that have features with residues (currently Artemis searches for these on features of type: '*chromosome*', '*sequence*', 'supercontig', 'ultra_scaffold', 'golden_path_region', 'contig'). The database manager is cached between sessions (this is on by default and can be switched off with <tt>-Ddatabase_manager_cache_off</tt>). There is an option under the File menu to clear this cache.
 
The database manager provides the list of organisms that have features with residues (currently Artemis searches for these on features of type: '*chromosome*', '*sequence*', 'supercontig', 'ultra_scaffold', 'golden_path_region', 'contig'). The database manager is cached between sessions (this is on by default and can be switched off with <tt>-Ddatabase_manager_cache_off</tt>). There is an option under the File menu to clear this cache.
  
=Adding Controlled Vocabulary Qualifiers in the Artemis Gene Builder=
+
==Adding Controlled Vocabulary Qualifiers in the Artemis Gene Builder==
  
 
These use evidence codes which are stored as a feature_cvtermprop's with a type_id which corresponds to a cvterm.name = 'evidence'. There is a useful [[Glossary#SQL|SQL]] script (<tt>etc/chado_extra.sql</tt>) in the Artemis distribution for creating this term in [[Chado]]. Run this on the chado_pathogen instance of the database:
 
These use evidence codes which are stored as a feature_cvtermprop's with a type_id which corresponds to a cvterm.name = 'evidence'. There is a useful [[Glossary#SQL|SQL]] script (<tt>etc/chado_extra.sql</tt>) in the Artemis distribution for creating this term in [[Chado]]. Run this on the chado_pathogen instance of the database:
Line 256: Line 241:
 
   psql chado_pathogen
 
   psql chado_pathogen
  
<sql>
+
<syntaxhighlight lang="sql">
 
   INSERT INTO cv ( name, definition ) VALUES ( 'CC_test', 'test' );
 
   INSERT INTO cv ( name, definition ) VALUES ( 'CC_test', 'test' );
</sql>
+
</syntaxhighlight>
 
and create a CvTerm in this CV:
 
and create a CvTerm in this CV:
<sql>
+
<syntaxhighlight lang="sql">
 
   INSERT INTO dbxref
 
   INSERT INTO dbxref
 
     ( db_id, accession )
 
     ( db_id, accession )
Line 271: Line 256:
 
     ( (SELECT cv_id FROM cv WHERE name ='CC_test'), 'test1',
 
     ( (SELECT cv_id FROM cv WHERE name ='CC_test'), 'test1',
 
       (SELECT dbxref_id FROM dbxref WHERE accession='test1') );
 
       (SELECT dbxref_id FROM dbxref WHERE accession='test1') );
</sql>
+
</syntaxhighlight>
 
Now re-launch Artemis and open the Gene Builder at any feature and go to the 'Controlled Vocabulary' section and click the 'ADD' button. This CV (CC_test) will appear in the drop down menu:
 
Now re-launch Artemis and open the Gene Builder at any feature and go to the 'Controlled Vocabulary' section and click the 'ADD' button. This CV (CC_test) will appear in the drop down menu:
  
  
[[Image:AddCV.gif]]
+
[[File:AddCV.gif]]
  
  
Line 281: Line 266:
  
  
=Transfer Annotation Tool (TAT)=
+
==Transfer Annotation Tool (TAT)==
  
 
This tool can be accessed from the Gene Builder - look for the TAT button. It allows you to transfer annotation between sequences. In database mode Artemis provides an editable list of genes constructed from ortholog/parlog links. These links can be added in the Gene Builder in the Match section (for example you can try creating the ortholog link between PF10_0165 in  ''Pfalciparum'' and PKH_060110 in ''Pknowlesi'').
 
This tool can be accessed from the Gene Builder - look for the TAT button. It allows you to transfer annotation between sequences. In database mode Artemis provides an editable list of genes constructed from ortholog/parlog links. These links can be added in the Gene Builder in the Match section (for example you can try creating the ortholog link between PF10_0165 in  ''Pfalciparum'' and PKH_060110 in ''Pknowlesi'').
  
=Logging Information=
+
==Logging Information==
  
 
Note that you can easily access the logging information Artemis produces. In the Artemis launch window under the 'Options' menu select the 'Show Log Window', this provides the logs. This is controlled by <tt>etc/log4j.properties</tt>. The logs can be useful for debugging and for monitoring activity if appended to a central file. See the [http://logging.apache.org/log4j/ log4j] documentation for more information.
 
Note that you can easily access the logging information Artemis produces. In the Artemis launch window under the 'Options' menu select the 'Show Log Window', this provides the logs. This is controlled by <tt>etc/log4j.properties</tt>. The logs can be useful for debugging and for monitoring activity if appended to a central file. See the [http://logging.apache.org/log4j/ log4j] documentation for more information.
  
=Running ACT=
+
==Running ACT==
  
 
ACT can read sequences in from the database as well. However, it currently does not read the BLAST comparisons from [[Chado]] but reads this data from files. These comparisons are displayed as the matches between the sequences. To distinguish forward and reverse matches the forward matches are red and reverse matches are blue.
 
ACT can read sequences in from the database as well. However, it currently does not read the BLAST comparisons from [[Chado]] but reads this data from files. These comparisons are displayed as the matches between the sequences. To distinguish forward and reverse matches the forward matches are red and reverse matches are blue.
Line 307: Line 292:
  
  
[[Image:ActSelection2seqs.gif]]
+
[[File:ActSelection2seqs.gif]]
  
  
Line 313: Line 298:
  
  
[[Image:Pf10_Pk6.gif]]
+
[[File:Pf10_Pk6.gif]]
  
  
Line 319: Line 304:
  
  
[[Image:ActSelection.gif]]
+
[[File:ActSelection.gif]]
  
  
Line 325: Line 310:
  
  
[[Image:Pk6_Pf10_Pk8.gif]]
+
[[File:Pk6_Pf10_Pk8.gif]]
  
  
=Writing Out Sequence Files=
+
==Writing Out Sequence Files==
  
 
Artemis can write out EMBL and [[GFF]] files for an entry opened from the database. You can optionally flatten the gene model (i.e. gene, transcript, exon) to just a CDS feature. Also an option is given to ignore any obsolete features. For EMBL it uses mappings for conversion of the keys and qualifiers. These mappings are stored in the <tt>etc/key_mapping</tt> and <tt>etc/qualifier_mapping</tt> files.
 
Artemis can write out EMBL and [[GFF]] files for an entry opened from the database. You can optionally flatten the gene model (i.e. gene, transcript, exon) to just a CDS feature. Also an option is given to ignore any obsolete features. For EMBL it uses mappings for conversion of the keys and qualifiers. These mappings are stored in the <tt>etc/key_mapping</tt> and <tt>etc/qualifier_mapping</tt> files.
Line 345: Line 330:
 
  etc/writedb_entry -Dchado="localhost:5432/chado_pathogen?gmod" NC_004314
 
  etc/writedb_entry -Dchado="localhost:5432/chado_pathogen?gmod" NC_004314
  
=Mailing List=
+
==Mailing List==
  
 
There is an Artemis mailing list: [http://lists.sanger.ac.uk/mailman/listinfo/artemis-users artemis-user].
 
There is an Artemis mailing list: [http://lists.sanger.ac.uk/mailman/listinfo/artemis-users artemis-user].
  
=References=
+
==References==
  
 
* [http://www.sanger.ac.uk/Software/Artemis/ Artemis home page]
 
* [http://www.sanger.ac.uk/Software/Artemis/ Artemis home page]

Latest revision as of 23:33, 8 October 2012


Artemis

This Artemis-Chado Integration tutorial was presented by Robin Houston, Tim Carver and Giles Velarde at the 2009 GMOD Summer School - Europe, August 2009. The most recent Artemis tutorial can be found at the Artemis Tutorial page.

This tutorial walks you through how to use the Artemis annotation editor with a Chado database.

VMware

  • Username: gmod
  • Password: gmod

Caveats

Important Note

This tutorial describes the world as it existed on the day the tutorial was given. Please be aware that things like CPAN modules, Java libraries, and Linux packages change over time, and that the instructions in the tutorial will slowly drift over time. Newer versions of tutorials will be posted as they become available.

Overview

In this tutorial we present how to install and configure Artemis and ACT to use with a Chado database. The first two sections relate to installing Postgres and Chado, this is included for completeness only and you should refer to the Chado session for more details on this.

Artemis is a DNA sequence browser which works with flat files (e.g. EMBL, GenBank, GFF) and more recently with Chado databases. ACT (Artemis Comparison Tool) is based on Artemis. ACT uses BLAST comparison files to highlight regions of interest between pairs of sequences. Artemis and ACT in database mode are increasingly being used in the Pathogen Genomics Group at the Sanger Institute.

Download and Install Postgres

./configure --prefix=/home/gmod/gmod_test/pgsl --with-pgport=5432 --with-includes=/Developer
make
make install
cd /home/gmod/gmod_test/pgsl
bin/initdb -D data/

Added the line to data/postgresql.conf:

listen_addresses = 'localhost'

Create the database:

postmaster -D data &
createuser --createdb username
createlang plpgsql template1
createdb --port=5432 chado_pathogen

Download and Install Chado

export GMOD_ROOT=/usr/local/gmod CHADO_DB_NAME=chado_pathogen CHADO_DB_USERNAME=username CHADO_DB_PORT=5432

Now compile Chado and install the standard components (schema and ontologies):

perl Makefile.PL
make
sudo make install
make load_schema
make prepdb
make ontologies

Examples of Loading Sequences into the Database

In this section we detail how to load 3 Plasmodium sequences into Chado for viewing in Artemis and ACT. Alternatively you can use your own sequences of interest.

The GenBank files are available from Entrez with the links below. Make sure you download it with the sequence by clicking on the option 'Show sequence' and 'Update View'. Then go to the Download menu and select GenBank(Full):

  • NC_004314 (Plasmodium falciparum 3D7 chromosome 10)
  • NC_011907 (Plasmodium knowlesi chromosome 6) and
  • NC_011909 (Plasmodium knowlesi chromosome 8).

These are usually downloaded to the Desktop directory (depending on the browser). They are saved as something like sequences.gbwithparts. Re-name them as NC_004314.gbk, NC_011907.gbk and NC_011909.gbk. Pfalciparum and Pknowlesi will need to be added to your organism table in Chado.

 psql chado_pathogen
   INSERT INTO organism
     ( abbreviation, genus, species, common_name )
   VALUES
     ( 'Pfalciparum', 'Plasmodium', 'falciparum', 'Pfalciparum'),
     ( 'Pknowlesi', 'Plasmodium', 'knowlesi', 'Pknowlesi');

Using the perl script bp_genbank2gff3.pl to convert the GenBank files to GFF3 format:

bp_genbank2gff3.pl -noCDS *.gbk

You need to modify the GFF files so that the correct SO term is used:

perl -pi~ -e s'|processed_transcript|mature_transcript|' *.gff

Then load the GFF3 files that have been created:

gmod_bulk_load_gff3.pl -organism Pfalciparum -dbname chado_pathogen \
    -dbuser gmod -dbport 5432 -dbpass dd -recreate_cache < NC_004314.gbk.gff
gmod_bulk_load_gff3.pl -organism Pknowlesi -dbname chado_pathogen \
    -dbuser gmod -dbport 5432 -dbpass dd -recreate_cache < NC_011907.gbk.gff
gmod_bulk_load_gff3.pl -organism Pknowlesi -dbname chado_pathogen \
    -dbuser gmod -dbport 5432 -dbpass dd -recreate_cache < NC_011909.gbk.gff

Download Artemis and ACT

You can download Artemis and ACT from their home pages at the Sanger Institute. For the most up-to-date developments download the software from the CVS server:

cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/pathsoft co artemis

Now compile the software:

cd artemis
make

Or download the development version from the Development section on the Artemis home page. Note that on the Artemis web site there is also a stable release available.

Running Artemis

Try running the art script in the download:

./art -Dchado="localhost:5432/chado_pathogen?gmod" -Dibatis

This opens the login window:


ArtemisLogin.gif


The Artemis Database Manager and File Manager will open once your login has been authenticated. The top part of this relates to the Chado database and the bottom comprises the file management:


DatabaseManager.gif


Select the sequence NC_004314 and double click on it to open it up in Artemis.


Artemis.gif


There are 3 main components to the Artemis window. The two top Feature Displays show the sequence at different levels of granularity and below these is a feature list:

  1. the top Feature Display is a zoomed out view of the sequence. The 3 forward and 3 reverse frames of translation are show with stop codons marked as black vertical lines.
  2. the second Feature Display shows the sequence at the nucleotide level. The amino acid translations are seen in this view.
  3. the Feature List shows the feature types and location. Options for displaying user defined qualifiers (e.g. Dbxref) can be accessed by right clicking on this list and selecting "Show Selected Qualifiers".

These three components are connected, so that if you select a feature in one then that feature becomes selected in the others. Double clicking on the feature centers the feature in both feature displays. The scroll bars on the right hand side of the feature displays allow you to zoom in and out.

The alternative way to open your sequence is to provide the entry (e.g. Pfalciparum:NC_004314) you want to open as a command line argument:

 ./art -Dchado="localhost:5432/chado_pathogen?gmod" -Dibatis \
        Pfalciparum:NC_004314

For any of the gene features in Artemis you can select them and press the short cut key 'E' (Edit → Selected Features in Editor). This opens up the Gene Builder. Within this the Gene Model can be edited and annotation added.


GeneBuilder


It is also possible to launch the Artemis Gene Builder in a standalone mode for a particular gene:

etc/gene_builder -Dchado="localhost:5432/chado_pathogen?gmod" -Dibatis -Dshow_log PF10_0003

or in read-only mode you can open a gene in GeneDB (at the Sanger Institute):

etc/gene_builder -Dchado="db.genedb.org:5432/snapshot?genedb_ro" -Dibatis  -Dshow_log -Dread_only PFA0010c

Note using the JVM option 'show_log' will open the log window.

Configuration Options

Edit etc/options (to change settings globally) or create a file ~/.artemis_options in your home directory (for your own settings). There are various flags that can be used to configure Artemis and ACT with Chado.

chado_servers This allows you to provide a list of available servers for the user to select:

chado_servers = \
  Plasmodium localhost:5432/chado_pathogen?username \
  GeneDB db.genedb.org:5432/snapshot?genedb_ro

product_cv In the Pathogen Genomics Group the product qualifiers are stored as an ontology (as a cv in feature_cvterm). This can be changed so that they are stored as featureprop's by setting the product_cv option:

product_cv=no

This will mean that the product will be shown in the "Core" section of the Artemis Gene Builder rather than the "Controlled Vocabulary" section.

synonym_cvname If synonym types are loaded into a CV, Artemis checks this ontology.

set_obsolete_on_delete This will set the default behaviour of Artemis when features are deleted. If set to:

set_obsolete_on_delete=yes

the features will be made obsolete. The user is still prompted with the option to permanently delete the feature. If this line is not in the option file the default is to permanently delete features.

Selecting an alternative gene model Artemis supports 2 types of gene model representations:

A) Pathogen Genomics Gene Model - implicit CDS + explicit UTRs

  gene
  |
  |- part_of mRNA
     |
     |---- part_of exon
     |
     |---- derives_from polypeptide
     |
     |---- part_of five_prime_UTR
     |
     |---- part_of three_prime_UTR

B) implicit CDS + UTRs

  gene
  |
  |- part_of mRNA
     |
     |---- part_of exon
     |
     |---- derives_from polypeptide

The Artemis default is model A. To use model B then set:

chado_infer_CDS_UTR=yes

sequence_update_features This lists the features that Artemis will maintain the feature.residue column for. This is generally useful for polypeptide and transcript features.

Artemis Database Manager

The database manager provides the list of organisms that have features with residues (currently Artemis searches for these on features of type: '*chromosome*', '*sequence*', 'supercontig', 'ultra_scaffold', 'golden_path_region', 'contig'). The database manager is cached between sessions (this is on by default and can be switched off with -Ddatabase_manager_cache_off). There is an option under the File menu to clear this cache.

Adding Controlled Vocabulary Qualifiers in the Artemis Gene Builder

These use evidence codes which are stored as a feature_cvtermprop's with a type_id which corresponds to a cvterm.name = 'evidence'. There is a useful SQL script (etc/chado_extra.sql) in the Artemis distribution for creating this term in Chado. Run this on the chado_pathogen instance of the database:

psql -d chado_pathogen -f etc/chado_extra.sql

(This will also create other terms that are used to store literature (PMID's) qualifiers.)

GO terms can now be selected in the Controlled Vocabulary (CV) section of the Gene Builder and added to features. Additional custom CV's can also be used. For Artemis to recognise it and display it the name of the CV needs to be prefixed by 'CC_'. These then appear in a drop down list when adding CV terms to a feature. Try adding a new CV:

 psql chado_pathogen
   INSERT INTO cv ( name, definition ) VALUES ( 'CC_test', 'test' );

and create a CvTerm in this CV:

   INSERT INTO dbxref
     ( db_id, accession )
   VALUES
     ( (SELECT db_id FROM db WHERE name = 'CCGEN'), 'test1' );
 
   INSERT INTO cvterm
     ( cv_id,  name, dbxref_id )
   VALUES
     ( (SELECT cv_id FROM cv WHERE name ='CC_test'), 'test1',
       (SELECT dbxref_id FROM dbxref WHERE accession='test1') );

Now re-launch Artemis and open the Gene Builder at any feature and go to the 'Controlled Vocabulary' section and click the 'ADD' button. This CV (CC_test) will appear in the drop down menu:


AddCV.gif


Click on CC_test and hit the 'Next' button. This opens a keyword selection box. If you leave this blank all the terms are retrieved and displayed. If you keep clicking 'Next' this term is then added to the 'Controlled Vocabulary' section.


Transfer Annotation Tool (TAT)

This tool can be accessed from the Gene Builder - look for the TAT button. It allows you to transfer annotation between sequences. In database mode Artemis provides an editable list of genes constructed from ortholog/parlog links. These links can be added in the Gene Builder in the Match section (for example you can try creating the ortholog link between PF10_0165 in Pfalciparum and PKH_060110 in Pknowlesi).

Logging Information

Note that you can easily access the logging information Artemis produces. In the Artemis launch window under the 'Options' menu select the 'Show Log Window', this provides the logs. This is controlled by etc/log4j.properties. The logs can be useful for debugging and for monitoring activity if appended to a central file. See the log4j documentation for more information.

Running ACT

ACT can read sequences in from the database as well. However, it currently does not read the BLAST comparisons from Chado but reads this data from files. These comparisons are displayed as the matches between the sequences. To distinguish forward and reverse matches the forward matches are red and reverse matches are blue.

For convenience the comparison files have been pre-generated for this exercise and can be downloaded from:

wget ftp://ftp.sanger.ac.uk/pub/pathogens/workshops/GMOD2009/NC_004314_v_NC_011907_tblastx.gz
wget ftp://ftp.sanger.ac.uk/pub/pathogens/workshops/GMOD2009/NC_004314_v_NC_011909_tblastx.gz

Note that both Artemis and ACT automatically open gzipped files. For details on generating these files go to ACT Comparison Files.

To run ACT use the act script:

./act -Dchado="localhost:5432/chado_pathogen?gmod" -Dibatis

From the 'File' menu select the option 'Open Database and SSH File Manager' and login. Drag and drop the Plasmodium entries from the Database Manager into the ACT selection window. Also, drag and drop the comparison files into this window, so it looks something like this (note the featureId numbers may well be different as these are the Chado feature_id):


ActSelection2seqs.gif


Click on Apply to read these entries and open up ACT. You can use the right hand scroll bar to zoom in and out. If you zoom out you can indentify the regions that match between these sequences.


Pf10 Pk6.gif


ACT can display multiple pairwise comparison. So the two P.knowlesi sequences can be compared to the P.falciparum sequence. From the ACT launch window go to the File menu and select 'Open Database and SSH File Manager'. Drag in the sequences and comparison files (clicking on 'more files' to add the additional sequence and comparison).


ActSelection.gif


Zooming out you will see that Pfalciparum chromosome 10 matches to regions in Pknowlesi chromosome 7 and 9.


Pk6 Pf10 Pk8.gif


Writing Out Sequence Files

Artemis can write out EMBL and GFF files for an entry opened from the database. You can optionally flatten the gene model (i.e. gene, transcript, exon) to just a CDS feature. Also an option is given to ignore any obsolete features. For EMBL it uses mappings for conversion of the keys and qualifiers. These mappings are stored in the etc/key_mapping and etc/qualifier_mapping files.

A utility script (etc/write_db_entry) is also provided as a means of writing out multiple sequences from the database. The script takes the following options:

-h      show help
-f      [y|n] flatten the gene model, default is y
-i      [y|n] ignore obsolete features, default is y
-s      space separated list of sequences to read and write out
-o      [EMBL|GFF] output format, default is EMBL
-a      [y|n] for EMBL submission format change to n, default is y

Try running:

etc/writedb_entry -Dchado="localhost:5432/chado_pathogen?gmod" NC_004314

Mailing List

There is an Artemis mailing list: artemis-user.

References