Difference between revisions of "April 2004 GMOD Meeting"

From GMOD
Jump to: navigation, search
m (New page: Generic Model Organism Database Construction Set GMOD Meeting April, 2004 Agenda (including links to powerpoint presentations) Progress reports Category:Meetings)
 
m
Line 1: Line 1:
 
Generic Model Organism Database Construction Set
 
Generic Model Organism Database Construction Set
 +
 +
==Meeting 4==
  
 
GMOD Meeting April, 2004
 
GMOD Meeting April, 2004
 +
 +
==Presentations==
 +
 +
* [[Media:Cain_040526.ppt|Cain_040526.ppt]]
 +
* [[Media:Crosby_040526.ppt|Crosby_040526.ppt]]
 +
* [[Media:Emmert_040526.ppt|Emmert_040526.ppt]]
 +
* [[Media:Gelbart_040528.ppt|Gelbart_040528.ppt]]
 +
* [[Media:Gilbert_040526.ppt|Gilbert_040526.ppt]]
 +
* [[Media:Harris_040527.ppt|Harris_040527.ppt]]
 +
* [[Media:Kasprzyk_040526.ppt|Kasprzyk_040526.ppt]]
 +
* [[Media:Kenny_040526.ppt|Kenny_040526.ppt]]
 +
* [[Media:Kodira_040526.ppt|Kodira_040526.ppt]]
 +
* [[Media:Matthews_040526.ppt|Matthews_040526.ppt]]
 +
* [[Media:Sabo_040526.ppt|Sabo_040526.ppt]]
 +
* [[Media:Schlueter_040526.ppt|Schlueter_040526.ppt]]
 +
* [[Media:Terry_040526.ppt|Terry_040526.ppt]]
 +
* [[Media:Worley_040526.ppt|Worley_040526.ppt]]
 +
  
 
Agenda (including links to powerpoint presentations)
 
Agenda (including links to powerpoint presentations)
Progress reports
+
 
 +
==Progress reports==
 +
<pre>
 +
GMOD Progect Progress Reports
 +
April, 2004
 +
-----------------------------
 +
 
 +
The past four months have seen the first two releases of gmod, which will
 +
become the suite of model organism database software.  The first release,
 +
version 0.001 (alpha), was release in January, 2004.  The main goal of that
 +
release was to establish a release procedure.  The release consisted of a
 +
database schema, referred to as chado, which is the database schema
 +
developed primarily by FlyBase developers at Harvard and BDGP.  Additionally,
 +
there were a variety of tools for installing and loading data into the
 +
database which were developed primarily by Allen Day at UCLA and Scott
 +
Cain at CSHL.  Finally, there was a compatible version of the Generic
 +
Genome Browser with a chado database adaptor developed to allow browsing
 +
of genome features directly from the database.
 +
 
 +
The second release, also an alpha release, consisted of the same components,
 +
and was release in March, 2004. In this release, the installation procedure
 +
improved considerably, and a prerequisite that had caused testers difficulties
 +
was removed.  During the GMOD meeting in April, this release was installed
 +
by several attendees during a workshop.  Several suggestions were made that
 +
will be implemented in the next release.
 +
 
 +
There are several items planned for addition or improvement in the next
 +
two releases.  Tools to allow importing and exporting XML formatted data
 +
from chado will be included, which will allow the sequence annotation tool,
 +
Apollo, to be used with chado. Addtionally, template based web front end for
 +
chado called turnkey will be included in an upcoming release.  This software is
 +
still early in the development process, but when it was presented to
 +
developers at the GMOD meeting in April, there was considerable interest
 +
in getting it included in a gmod release as soon as possible.
 +
 
 +
Longer term goals for gmod releases are including pubsearch and pubfetch.
 +
The process of porting these applications has begun and is expected to be
 +
complete by the end of the year.  A tool for liturature based sequence
 +
annotation, called JavaSEAN, is expected to be included in gmod in a similar
 +
time frame.  Additionally, there are plans from the Apollo developers to
 +
create a new version of Apollo that will be able to read and write directly
 +
to the database without using an XML intermidary, which will simplify the
 +
process of sequence annotation considerably.
 +
 
 +
 
 +
 
 +
Apollo Progress Report (11/2003 - 4/2004)
 +
 
 +
Major improvements in release 1.3.6 (11/3/03):
 +
 
 +
Apollo now runs under JDK1.4, which works better on most platforms.
 +
 
 +
Can rubberband a region on the axis and the selected sequence will pop up
 +
in a Sequence window.
 +
 
 +
Results that represent hits against sequences that are new to their
 +
respective database (as indicated in tiers file) are shown with a box
 +
around them, so that the curator can immediately see which results are
 +
new and need to be looked at.
 +
 
 +
Search (Find) now allows full regexps.
 +
 
 +
Instead of having the config files in $HOME/.apollo be slightly modified
 +
copies of the ones in APOLLO_ROOT/conf, you can now put ONLY the stuff
 +
you want changed into your personal cfg files.  Apollo will first read
 +
the ones in APOLLO_ROOT/conf, and then read your personal cfgs and apply
 +
any modifications.
 +
 
 +
Synteny (see Synteny section at end)
 +
 
 +
 
 +
Major improvements in release 1.4.0 (internal release) (2/9/2004):
 +
 
 +
New game.tiers file format (easier to read and change).  If you have an
 +
old game.tiers, it will be autoconverted to the new format.
 +
 
 +
Better handling of non-gene annotation types.  New glyphs for showing
 +
them in main Apollo display.
 +
New annotations are automatically assigned the type (e.g. gene, tRNA,
 +
etc.) appropriate to the evidence that was used to create them.  (Type
 +
can then be changed in the annotation info editor, if desired.)
 +
 
 +
Structured transaction records are now added to the XML when you save.
 +
They include the type of object that changed (e.g. TRANSCRIPT;
 +
ANNOTATION; COMMENT), the operation (e.g. ADD, SPLIT, etc.), the relevant
 +
names and/or IDs before and after the transaction, and the user and
 +
time/date when the change was made.
 +
 
 +
Support for translational exceptions, including frame shifts and one base
 +
pair genomic sequencing errors.
 +
 
 +
UTRs are now shown in a different (configurable) color from the rest of
 +
the gene.
 +
 
 +
Restriction enzyme mapper:
 +
- Cut sites show up in main window (near the axis)
 +
- Can now map multiple restriction enzymes at once
 +
- Table of restriction fragments; can be selected for viewing in
 +
Sequence window
 +
 
 +
Annotation info window:
 +
- Now has integrated annotation tree
 +
- Shows arbitrary properties for annotations and transcripts (including
 +
validation_flag)
 +
- Shows translational exceptions and genomic sequencing errors
 +
- Lets you edit annotation ID as well as name/symbol
 +
 
 +
Ability to tag results by selecting from a list of comments, which are
 +
specified (as ResultTags) in game.style.  Tagged results are crosshatched
 +
in pink in the display.
 +
 
 +
Fixed updating of peptide sequences.
 +
 
 +
 
 +
Improvements in releases 1.4.1 (3/12/04) and 1.4.2 (3/18/04):
 +
 
 +
Red/green markers at axis show where sequence/region ends.
 +
 
 +
To help you identify splice sites that are unconventional, colored
 +
triangles appear in the annotation glyph.
 +
 
 +
Can now load D. melanogaster data from r3.1 (gadfly) and r3.2 (chado)
 +
(both via cgi).
 +
 
 +
 
 +
1.4.3 (4/19/04):
 +
Let users get the sequence of the entire segment you're looking at, not
 +
just a rubberbanded section.  [File -> Save sequence]
 +
 
 +
 
 +
 
 +
Synteny progress, 11/03-4/04:
 +
 
 +
- Synteny now works with GAME. You can load one species and then use the
 +
blast or syntenic block results to another species (for now it's pseudo)
 +
to load another species. The other species is loaded with the same range
 +
around that feature. Links between the two species are automatically
 +
derived from the blast link features that are present in both datasets
 +
(no explicit link file needs to be specified).
 +
 
 +
- Database chooser was added to select the different species databases.
 +
 
 +
- Able to switch back and forth from synteny data adapter to regular data
 +
adapters without restarting Apollo.
 +
 
 +
- Can save and edit (edit could use some rigorous testing)
 +
 
 +
- Can home in on link from link popup menu. Zooms and shows the strands
 +
of homed in link, strands not in link are hidden.
 +
 
 +
- Species now zoom and scroll together by default. Can unlock zoom with
 +
shift key, and unlock scroll with menu item.
 +
 
 +
- You can now config links between 2 curation sets that contain links to
 +
each other. link_type, source and hit species are specified in the linked
 +
type in the tiers file. This works with game, in theory could be made to
 +
work with other adapters that have linked data embedded in the species
 +
data.
 +
 
 +
 
 +
 
 +
Textpresso: A progress report
 +
 
 +
Eimear Kenny, Hans-Michael Mueller and Paul Sternberg
 +
 
 +
Updates made to Textpresso since September 2003:
 +
 
 +
Textpresso for Yeast Literature
 +
(Toward a generic MOD information retrieval/extraction search engine)
 +
 
 +
SGD developers and curators met with Eimear Kenny for two weeks at
 +
the begining of March at Stanford to build a Textpresso search engine for
 +
Yeast. During that period the Textpresso software was installed on a
 +
Solaris system and three builds with a test corpus of ~400 full text
 +
journal articles were completed. In addition, the Textpresso Ontology for
 +
worm literature was modified to a functional preliminary ontology for
 +
yeast literature. Plans to expand the corpus to 10,000 yeast papers and
 +
make improvements to the yeast ontology are underway at Stanford.
 +
 
 +
Integration of Textpresso into Literature Curation Pipeline
 +
 
 +
We have integrated Textpresso to the Wormbase curation pipeline
 +
to expediate the extraction of genetic interaction information from the
 +
literature. A prototype curation interface has been developed to enable a
 +
curator to extract data from sentences returned by a Textpresso query for
 +
genetic interaction. We find that these Textpresso sentences are enriched
 +
3-fold for gene-gene interactions compared to sentences that mention two
 +
or more gene names and 39-fold compared to random sentences from the
 +
literature.
 +
 
 +
Textpresso MOD interface
 +
 
 +
We have generated a Wormbase-like interface for Textpresso to integrate
 +
the Textpresso information retrieval engine in the Wormbase web-site.
 +
http://www.textpresso.org/cgi-bin/wb/textpressoforwormbase.cgi?allabstracts=on&searchmode=sentence&searchtargets=Paper&searchtargets=Abstract
 +
 
 +
Textpresso Package
 +
 
 +
Hans-Michael Mueller is working on packaging Textpresso for release in
 +
the first half of this year.
 +
 
 +
Textpresso paper ... under review
 +
 
 +
A Textpresso publication is currently under revision.
 +
 
 +
 
 +
 
 +
PubFetch/PubTrack Progress Report (April 2004)
 +
 
 +
PubFetch
 +
PubFetch is a tool for accessing literature from various online resources.
 +
The goal is to provide a common interface and common format to downstream
 +
applications to allow them to query different literature repositories in
 +
a single, unified fashion.
 +
 
 +
PubFetch has been implemented in two forms:
 +
* Java servlet core + simple web interface to provide interactive access
 +
to PubFetch
 +
    * Provides access to PubMed and Agricola databases
 +
* BioMOBY wrapper around servlet core to provide webservice access to PubFetch
 +
 
 +
A variety of new features have been introduced:
 +
* Duplicate filtering - running the same search on multiple data sources
 +
results in some duplication of articles, the duplicate filter detects
 +
these articles returning a non-redundant set of data. Database Ids from
 +
both sources are maintained in the non-redundant set.
 +
* The web interface version highlights keywords in the search results to
 +
aid in review of the returned articles.
 +
* Connection to full text - a hyperlink to the full text is returned (if
 +
available from PubMed)
 +
* Filtering of 'ahead of print' articles - Abstracts are appearing in
 +
PubMed and being assigned PubMed Ids prior to being published and are
 +
being reassigned PubMed Ids after publication. PubFetch allows filtering
 +
of these ahead of print articles to retrieve only published articles.
 +
 
 +
The BioMOBY interface provides the following services:
 +
* SearchPubmed - Search PubMed for given query and get PMIDs
 +
* GetPubmed - Retrieve PubMed articles in MEDLINE display format for given
 +
PMIDs
 +
* FetchFull - Get FullText for given PMID
 +
* fetchAgID - Search Agricola for given query and get Agricola accession number
 +
* fetchAgDoc - Get Agricola document in MEDLINE like format for given
 +
Agricola accession number
 +
 
 +
Current work
 +
The integration of PubFetch and PubSearch is in progress, our goal is to
 +
have PubSearch using the PubFetch core module for literature retrieval by
 +
summer of 2004. We will be adapting the Rat Genome Database literature
 +
pipeline to use the PubFetch BioMOBY services to act as its source for
 +
literature data download.
 +
 
 +
The current version of PubFetch is available from the GMOD cvs:
 +
http://cvs.sourceforge.net/viewcvs.py/gmod/pubfetch/
 +
 
 +
Implementation of PubSearch at RGD
 +
Following a curator review of existing PubSearch functionality, a variety
 +
of new features were requested by the RGD curators to enable a more
 +
'article-centric' view of the PubSearch database. This has been
 +
implemented by the TAIR group and plans are underway to install this
 +
latest version of PubSearch at RGD, populate with RGD/Rat data and test
 +
in the RGD curation process.
 +
 
 +
PubTrack
 +
PubTrack is a monitoring tool that tracks objects as they move through
 +
a process or workflow. Existing workflow tools move data through a
 +
specified process, passing datasets to applications and retrieving
 +
results and passing them to the next step in the flow. PubTrack does
 +
not aim to direct or control workflow and it does not track the dataset
 +
as a whole, it provides a higher resolution and tracks the data objects
 +
within the dataset, enabling users to follow a particular object as it
 +
moves through a process.
 +
 
 +
Progress to date:
 +
* Review of existing workflow tools and schemas has been completed.
 +
* The initial PubTrack schema has been developed and implemented in PostgreSQL
 +
* Initialization scripts have been written to populate the PubTrack
 +
database with initial object and process data. Perl scripts are used to
 +
parse and load initialization data in a standard XML format; a DTD is
 +
available and is used to confirm the data formatting.
 +
* An API is under development to allow 3rd party applications to
 +
communicate with PubTrack to initialize and update the tracking
 +
information for objects under observation. This is being developed and
 +
tested using data from a proteomics MS/MS analysis pipeline that is
 +
being built in my lab.
 +
* A basic web user interface is in development to provide end-users with
 +
the ability to view objects and their progress through their designated
 +
processes.
 +
* The concept of 'estimated time of completion' has been added to allow
 +
long term planning and project tracking. For example, the entire process
 +
of curating an article might typically take 3 days, so the estimated time
 +
of completion would be 3 days after the start of curation. This estimate
 +
can be displayed on a Gantt chart and updated as individual steps in the
 +
process are completed, allowing an increasingly refined view of the
 +
completion date. This is being used in our proteomics tracking - component
 +
1 generates tissue samples from animals in a process that takes upto 3
 +
weeks to complete. By tracking the progress and updating the completion
 +
time estimate using PubTrack it allows lab members in component 2 to plan
 +
ahead. They are able to see what samples will be ready and on what date
 +
they will be ready and this is updated as the process progresses.
 +
 
 +
Current Work
 +
When the API is stabilized we will deploy PubTrack in the existing RGD
 +
literature curation pipeline and ultimately in combination with PubSearch
 +
at RGD. This will create an entire system allowing tracking of literature
 +
across a heterogeneous system as it is downloaded from PubMed, into
 +
PubSearch, screened, moved to RGD's Oracle db, curated and ultimately
 +
filed. A more comprehensive user interface will be developed based on the
 +
experiences from the proteomics pipeline and the RGD curation pipeline.
 +
The goal is to provide generic tracking views and a way to allow specific
 +
users to customize the displays, charts and reports if needed.
 +
 
 +
PubTrack documents including schema, loading scripts, etc. can be found on
 +
the GMOD CVS.
 +
http://cvs.sourceforge.net/viewcvs.py/gmod/pubtrack/
 +
 
 +
 
 +
 
 +
PubSearch update
 +
 
 +
We've migrated our database schema over to one that should be more
 +
compatible with a Chado schema --- all of our table names are now prefixed
 +
with a 'pub_' prefix, and we've done some column renaming so that we use
 +
consistant names throughout the system.
 +
 
 +
Our production server has been also upgraded from MySQL3 to MySQL4, and
 +
we've rewritten some parts of Pubsearch to take advantage of the
 +
transaction support that the new MySQL provides.  We've also added
 +
referential integrity constraints to the foreign keys in our tables.
 +
 
 +
We've adopted another tool called JCoverage to help us identify areas of
 +
our code that are not being touched by our unit cases, and have started to
 +
tighten up our test cases so that our major classes are being exercised.
 +
 
 +
We've worked toward removing dependencies on external resources.  Hit
 +
generation now works directly from the Java codebase, rather than from an
 +
external Python script.  We've continued work on a keyword term browser to
 +
replaced the highly munged version of AmiGO that we are running locally.
 +
 
 +
 
 +
 
 +
GBrowse Project
 +
 
 +
Coordinator: Lincoln Stein
 +
Major Developers: Scott Cain
 +
          Aaron Mackey
 +
  Toshiaki Katayama
 +
  Vsevolod Ilyushchenko
 +
  Marc Logghe
 +
  Sheldon McKay
 +
  Mark Wilkinson
 +
 
 +
DESCRIPTION:
 +
 
 +
GBrowse is a web-based browser for genome annotations.  It is intended to
 +
complement Apollo by providing a search, browse and drill-down display for
 +
sequence-based features without the need for prior software installation. 
 +
GBrowse uses a database adaptor system to connect to a single primary data
 +
source, and a temporary flat-file system to layer an arbitrary number of
 +
third-party annotations on top of the primary data.  A plugin system is used
 +
to add new functionality to gbrowse, such as more advanced searches, and
 +
dynamically-computed features such as ab initio gene predictions.  An
 +
internationalization layer allows GBrowse to display button labels, menus and
 +
help text in a variety of common world languages.
 +
 
 +
The following gbrowse database adaptors currently exist:
 +
 
 +
      Bio::DB::GFF (oracle, postgresql & mysql)
 +
      Well-tested and in production.
 +
 
 +
      Bio::DB::Das::Chado (postgresql)
 +
      Well-tested and in early production.
 +
 
 +
      GenBank proxy
 +
      Well-tested and in production.  Does not handle
 +
      full-genbank keyword searches properly.
 +
 
 +
      Bio::DB::Das::BioSQL
 +
      Adaptor for the BioSQL schema.  In beta test.
 +
 
 +
      Bio::Das
 +
      Adaptor for DAS sources. Released, but probably best
 +
      considered in beta test.
 +
 
 +
GBrowse has been downloaded from SourceForge 1,830 times, but this is
 +
a poor way to count the number of GBrowse users.  A more conservative
 +
estimate of users comes from tallying bug reports, which ensures that
 +
the user has at least tried to install the software.  However, it
 +
represents an undercount.  In any case, we can confirm that at least
 +
100 laboratories have installed GBrowse.  As the list attached to the
 +
bottom of this report shows, GBrowse can be found in academic,
 +
governmental and commercial organizations in North America, South
 +
America, Europe, Asia, Africa and Australia.
 +
 
 +
RECENT PROGRESS:
 +
 
 +
Since the last status report, we have added the following features to
 +
GBrowse:
 +
 
 +
1) SVG output
 +
 
 +
Users can now click on a link labeled "Publication Quality Image" and
 +
download a Scaleable Vector Graphics version of the current view.  SVG
 +
is an editable format that can be manipulated with popular graphics
 +
programs such as Adobe Illustrator, and can be reprinted by journals
 +
without the pixelation that plagues bitmapped images.
 +
 
 +
2) Security
 +
 
 +
Tracks can now be protected by username & password, restricted to
 +
certain hosts, or limited to hosts presenting certain classes of RSA
 +
(digital) certificates.  A restricted track does not appear on the
 +
screen of unauthorized users, allowing system administrations to
 +
present a mix of proprietary and public data.
 +
 
 +
3) DAS support
 +
 
 +
GBrowse can now run on top of distributed annotation system sources.
 +
DAS is supported in three ways:
 +
    a) As an external annotation source
 +
      Users can layer remote DAS tracks on top of the current view.
 +
      The remote DAS tracks will remain active from session to
 +
      session.  The GBrowse administrator can preconfigure a set
 +
      of "recommended" DAS sources, which will then appear in a
 +
      user-selectable menu.
 +
 
 +
    b) As a primary database
 +
      GBrowse can now be configured to use a local or remote DAS
 +
      database as its primary data source.  This means that one
 +
      can point GBrowse at the UCSC or ENSEMBL databases and
 +
      immediately begin browing them using the GBrowse user
 +
      interface.
 +
 
 +
    c) As a DAS source
 +
      GBrowse will act as a DAS server.  At the administrator's
 +
      discretion, all or selected tracks can be made exportable
 +
      via DAS, allowing sequence features be shared between
 +
      GBrowse instances or between GBrowse and other DAS clients.
 +
 
 +
4) Feature filtering and highlighting
 +
 
 +
A new filtering and highlighting API allows plugins to hide features
 +
based on a set of user-supplied criteria or to highlight them in
 +
various colors.
 +
 
 +
5) New adaptors
 +
 
 +
In addition to the DAS adaptor, we have added an experimental BioSQL
 +
adaptor to GBrowse.  BioSQL is a flexible database schema designed by
 +
the BioPerl & BioJava projects for the purposes of holding
 +
GenBank/EMBL records in a relational format.
 +
 
 +
6) Support for GFF3 loading & dumping
 +
 
 +
GBrowse now can load and dump sequence annotations in GFF3 format
 +
(http://song.sourceforge.net), a preliminary specification that
 +
improves on the current GFF sequence feature format.  The advantage of
 +
this format is that it uses the Sequence Ontology, a controlled
 +
vocabulary of sequence feature types.
 +
 
 +
7) Integrated MOBY support
 +
 
 +
The BioMOBY system (www.biomoby.org) is a web services system that
 +
allows users to quickly locate and invoke bioinformatics services.
 +
GBrowse now has an interface which allows it to find services that
 +
will operate on selected sequence features.  For example, GBrowse can
 +
present users with a list of current services that will operate on
 +
Drosophila gene names.
 +
 
 +
8) Support for writeback
 +
 
 +
A writeback layer has been added to GBrowse to allow external editors
 +
to update the underlying database.  This has been tested successfully
 +
with the Artemis editor in the context of a USDA pathogens database
 +
project.  Testing with Apollo is still underway.  Currently it is
 +
recommended to edit sequence databases via the shared Chado schema and
 +
the Apollo->Chado->GBrowse route, rather than to use Apollo->GBrowse
 +
directly.
 +
 
 +
9) New glyphs
 +
 
 +
We have recently added a number of new glyphs for use with the
 +
International HapMap Project.  New glyphs include a "weighted allele"
 +
glyph that indicates the major and minor alleles of a single
 +
nucleotide polymorphism, and a set of glyphs for visualizing haplotype
 +
blocks.
 +
 
 +
10) Bug fixes
 +
 
 +
Performance has been improved when uploading large 3d party annotation
 +
files.  Nucleotide-level alignments have been fixed when the display
 +
is "flipped."  The feature name search methods have been cleaned up to
 +
provide more consistent behavior.
 +
 
 +
PLANS FOR THE FUTURE:
 +
 
 +
Performance is a concern when viewing large numbers of uploaded
 +
third-party features. We plan to fix this by implementing a indexed
 +
flat file cache for uploaded features.
 +
 
 +
The user interface needs to be improved in some respects.  One useful
 +
idea is to place an icon to the left of each track to indicate whether
 +
it is in a expanded or collapsed state.
 +
 
 +
The ability to use a different DAS source for each track, which is a
 +
feature of ISB GBrowse, will be ported over.
 +
 
 +
As always, we are looking for volunteers fluent in non-English
 +
languages to create and update the internationalization files.
 +
 
 +
Contact: Lincoln Stein <lstein@cshl.org>
 +
 
 +
APPENDIX. Confirmed users of GBrowse:
 +
 
 +
Agricultural Biotechnology Center, Hungary
 +
BAWI, S. Korea
 +
Baylor College of Medicine
 +
Biocrates GmbH, Innsbruck
 +
Brandeis University
 +
Bristol-Meyers Squibb
 +
British Columbia Centre for Diseaes Control
 +
CIRAD, France
 +
CSIRO, Australia
 +
Cambridge University (multiple labs)
 +
Center for Genomics & Bioinformatics, Stockholm
 +
Center for Genomics and Bioinformatics, Stockholm
 +
Centre de Genetique Moleculaire, CNRS
 +
Cold Spring Harbor Laboratory (multiple labs)
 +
Compugen
 +
Concordia University, Canada
 +
Cornell Medical School
 +
Cornell University
 +
DNA Landmarks, Inc.
 +
Donald Danforth Plant Sciences Center
 +
Duke University (multiple labs)
 +
EMBL, Heidelberg
 +
EuGenes (hacked copy)
 +
Faculdade de Medicina de Ribeiro Preto, So Paulo
 +
FlyBase
 +
Foundation for Research and Technology, Crete
 +
Fundao Hemocentro, Sao Paolo
 +
Genoscope, France
 +
GrainGenes
 +
Harvard University
 +
Hospital for Sick Kids, Toronto
 +
Illinois Institute of Technology
 +
Incyte Corporation
 +
Inpharmatica, Ltd.
 +
Institute for Systems Biology, Seattle
 +
Institute of Molecular and Cell Biology, Singapore
 +
International Rice Research Institute, Phillipines
 +
John Innes Centre
 +
KEGG
 +
Kansas State University
 +
Karolinska Institute
 +
Kennedy Krieger Institute
 +
Lawrence Berkeley Laboratories
 +
Marine Biological Laboratories, Woods Hole
 +
Massachusetts Institute of Technology (multiple labs)\
 +
Mayo Institute
 +
McGill University
 +
Meat Animal Research Center, University of Nebraska
 +
Medical University of South Carolina
 +
Michigan State University  
 +
NHGRI, NIH
 +
National Cancer Institute, Frederick Cancer Center
 +
New York University (multiple labs)
 +
North Carolina State University
 +
Northern Illinois University
 +
Northwestern University
 +
Oklahoma State University
 +
Open Informatics Consulting Corp.
 +
Oxagen Corp.
 +
Pasteur Institute, Paris
 +
Pioneer Corporation
 +
QIAGEN Operon Corp.
 +
RIKEN (multiple labs)
 +
RatDB
 +
Regulome, Inc.
 +
Rhobio (Bayer CropScience SA & Biogemma joint venture)
 +
Rigshospitalet, Copenhagen
 +
Rockefeller University
 +
Roslin Institute, Edinburgh
 +
Russian Academy Medical Sciences
 +
Serono International Corp, Geneva
 +
Simon Frasier University
 +
South Africa National Bioinformatics Institute
 +
Southern Illinois University
 +
St. Jude Children's Research Hospital, Memphis
 +
Stowers Institute for Medical Research
 +
Texas A&M (multiple labs)
 +
The Institute for Genome Research
 +
Tulane University
 +
Tulane University
 +
University California Davis
 +
University of Arizona (multiple labs)
 +
University of British Columbia
 +
University of California Santa Barbara
 +
University of Georgia (multiple labs)
 +
University of Minnesota
 +
University of Muenster
 +
University of New South Wales, Australia
 +
University of Oklahoma (multiple labs)
 +
University of Pennsylvania (multiple labs)
 +
University of Southern California
 +
University of Texas
 +
University of Toronto
 +
University of Virginia
 +
University of Washington
 +
Universitt Giessen
 +
Universit de Lige, Belgium
 +
Wageningen Universiteit & Researchcentrum, Netherlands
 +
Washington University at St. Louis (multiple labs)
 +
WormBase
 +
                deVGen, Belgium
 +
 
 +
 
 +
CMAP
 +
Main developer: Ken Clark
 +
 
 +
Recent improvements include:
 +
 
 +
*  Now CGI-based (no more mod_perl dependencies), making installation
 +
    much easier (and much more like Gbrowse)
 +
*  Added SVG output
 +
*  Added multiple aliases for features
 +
*  Added support for arbitrary attributes for db objects
 +
*  New cross-reference scheme allows for unlimited xrefs on most db objects
 +
*  Experimental XML export/import of data added
 +
*  User tutorial added
 +
*  Faster, fewer bugs, etc.
 +
 
 +
CMAP is known to be in use by:
 +
 
 +
Barry Marler (Andy Paterson), Alex Feltus, Pratt: UGA
 +
Rex Nelson, Chet Langin, Xiaokang Pan: Iowa State
 +
Michelle Bobo: Oregon Health & Science University
 +
Victor Ulat, Richard Bruskiewich: IRRI
 +
Matthew Hobbs: University of Sydney (Australia)
 +
 
 +
 
 +
 
 +
                          Pathway Tools Status Report
 +
                                  Peter Karp
 +
                                February 5, 2004
 +
 
 +
Please note that the full history of updates to Pathway Tools can be
 +
found at URL
 +
http://bioinformatics.ai.sri.com/ptools/release-notes.html
 +
 
 +
Significant updates funded under this grant since the last report are
 +
as follows.
 +
 
 +
o We have implemented the proposed Napster-like peer-to-peer sharing
 +
of Pathway/Genome Databases via a central network registry server.
 +
Pathway Tools users will be able to use the software to register new
 +
PGDBs that they create to this central registry server at SRI, and
 +
they will be able to use the software to browse the registry and
 +
to retrieve and install PGDBs listed there for local analysis.
 +
 
 +
o Pathway Tools has been extended to support annotation of protein
 +
domains, sites, and chemical modifications.  We have created an
 +
ontology of domain, sites, and modification types.  The Pathway/Genome
 +
Editor tools have been extended to allow users to interactively
 +
annotate these features on protein sequences, and the Pathway/Genome
 +
Navigator has been extended to display these annotated features.
 +
 
 +
o We have added a batch-processing mode to the portion of Pathway Tools
 +
that creates new Pathway/Genome Databases to allow large-scale automated
 +
processing of multiple genomes without manual intervention.  We have
 +
undertaken a collaboration with the European Bioinformatics Institute,
 +
who are interested in applying Pathway Tools to generate Pathway/Genome
 +
Databases for a large number of genomes.
 +
 
 +
o We have integrated an algorithm for pathway hole filling into
 +
Pathway Tools.  A pathway hole is a reaction step in a metabolic
 +
pathway for which no enzyme has been identified in the genome of
 +
an organism.  The pathway hole filler uses a combination of techniques
 +
to predict which genes in the genome code for these missing enzymes.
 +
[This algorithm developed under separate funding.]
 +
 
 +
o We have completely re-designed the menus of the desktop version
 +
of Pathway/Genome Navigator to be more consistent with other
 +
graphical interfaces, more intuitive to the user, and to provide
 +
more screen area to display of visualizations.
 +
 
 +
o We have integrated an SBML (Systems Biology Markup Language) output
 +
tool written in the Church lab at Harvard into Pathway Tools, allowing
 +
the reaction network within a Pathway/Genome Database to be exported
 +
to SBML format, from which it can be imported into a number of
 +
simulation and analysis software packages.
 +
 
 +
o We have reworked the display of information about protein complexes
 +
within Pathway Tools to increase the clarity of this information.
 +
 
 +
o The preceding capabilities will be present in the February release
 +
of Pathway Tools.
 +
 
 +
o We have received many emails from users reporting bugs, and asking for
 +
information.
 +
 
 +
o 80 groups have licensed Pathway Tools to date.
 +
 
 +
o Pathway/Genome Databases available through the web include:
 +
 
 +
  o Saccharomyces cerevisiae, Stanford University
 +
    http://pathway.yeastgenome.org/biocyc/
 +
 
 +
  o Plasmodium falciparum, Stanford University
 +
    plasmocyc.stanford.edu
 +
 
 +
  o Mycobacterium tuberculosis, Stanford University
 +
    BioCyc.org
 +
 
 +
  o Arabidopsis thaliana and Synechosistis, Carnegie Institution of Washington
 +
    Arabidopsis.org:1555
 +
 
 +
  o Methanococcus janaschii, EBI
 +
    Maine.ebi.ac.uk:1555  (availability intermittent)
 +
 
 +
 
 +
                          Pathway Tools Status Report
 +
                                  Peter Karp
 +
                                April 20, 2004
 +
 
 +
Please note that the full history of updates to Pathway Tools can be
 +
found at URL
 +
http://bioinformatics.ai.sri.com/ptools/release-notes.html
 +
 
 +
Significant updates funded under this grant since the last report in
 +
February 2004 are as follows.
 +
 
 +
o Version 8.0 of Pathway Tools was released on March 12, 2004.
 +
SRI continues to hold to our planned schedule of two releases of
 +
Pathway Tools per year.
 +
 
 +
o 275 groups have licensed Pathway Tools to date.  The large jump
 +
in this number since the last report reflects the fact that these
 +
numbers also include groups who use Pathway Tools to query
 +
existing Pathway/Genome Databases (not reported earlier), in addition
 +
to groups who use it to create new databases.
 +
 
 +
o We have made very significant progress on development of an
 +
algorithm to automatically lay out the one-page metabolic overview
 +
diagram that shows the full metabolic network of an organism -- the
 +
algorithm is now working.  We are also in the process of adding new
 +
components of the cellular machinery to this diagram.
 +
 
 +
o SRI has hosted two 4-day training sessions for Pathway Tools.
 +
The dates and 26 attendees are listed below.  Most attendees have
 +
brought genomes with them to the training sessions, and have left
 +
with draft Pathway/Genome Databases.
 +
 
 +
Tutorial on March 15-18, 2004
 +
 
 +
1. John Burke   Biotique Inc.
 +
2. Guillaume Meurice   Pasteur Institute
 +
3. David Simon   Pasteur Institute
 +
4. Gregory P. Fournier   MIT
 +
5. Alex Picone   Biatech
 +
6. John Bashkin   SRI
 +
7. Tit Yee wong   University of Memphis
 +
8. Ken Kaufman   UC Berkeley
 +
9. Jeremy Glasner   University of Wisconsin
 +
10. Lisa Herron-Olson   University of Minnesota
 +
11. Devaki Bhaya   Carnegie Institution
 +
 
 +
 
 +
Tutorial on April 19-22, 2004
 +
 
 +
1 Dr. Matthew Berriman The Wellcome Trust Sanger Institute
 +
T. brucei & L. Major
 +
2 Herbert Chiang Washington University
 +
Bacteroides thetaiotaomicron
 +
3 Clinton Fernandez University of British Columbia
 +
Rhodococcus sp. RHA1 (~10MB)
 +
4 Lisa Koski University of Montreal, Canada
 +
5 Rebecca Krupp UCLA
 +
Methanosarcina acetivorans
 +
6 Joanne Luciano BioPathways Consortium
 +
Prochlorococcus marinus MED4
 +
7 Jasintha Maniraja Universite Libre de Bruxelles
 +
Mus musculus
 +
8 Linyong Mao Pacific Northwest National Laboratory
 +
Shewanella oneidensis
 +
9 Michael P. McLeod University of British Columbia
 +
Rhodococcus sp. RHA1 (~10MB)
 +
10 Dylan Morris CalTech
 +
Mycoplasma genitalium
 +
11 Gavin Murphy CalTech
 +
Bdellovibrio
 +
12 Joo-Heon Park University of Tex-Houston Med School
 +
Treponema pallidum
 +
13 Liviu Popescu Cornell University, Computer Science
 +
Sacaromyces cerevisae
 +
14 Christopher Reigstad Washington University
 +
unpublished uropathogenic E. coli
 +
15 Haluk Resat Pacific Northwest National Laboratory
 +
16 Jian Song Los Alamos National Laboratory
 +
Pseudomonas aeruginosa
 +
 
 +
 
 +
 
 +
GMOD Project Status    April 2004        D. Gilbert (gilbertd@indiana.edu)
 +
 
 +
Project members:  Don Gilbert, Josh Goodman, Paul Poole,
 +
Vasanth Singan (student), at Indiana University.
 +
 
 +
Projects in development for GMOD:
 +
 
 +
(1) LuceGene, document/object search/retrieval for genome data
 +
www.gmod.org/lucegene/  eugenes.org:8081/gmod/lucegene/
 +
version 1.2 (alpha), released for public use April 2004.
 +
In use at FlyBase.net, euGenes.org, wFleaBase. LuceGene is similar in
 +
concept to the bioinformatic databank access tool SRS, and web search
 +
systems such as Google. Based on Lucene, this Java program is fast and
 +
flexible at search and retrieval of complex data objects.  It
 +
outperforms Chado Postgres database by 10x or more at gene object
 +
retrieval.
 +
 
 +
(2) Genome Directory System, data mining access to genome data 
 +
www.gmod.org/gds/
 +
In development, web services for SOAP access to genome data and bio
 +
sequence databanks.  Plan to provide production data mining services
 +
through this including FlyBase, euGenes genomes and Bio-Mirror/IUBio
 +
biosequence databanks. Will add to ARGOS package for genome databases.
 +
Includes plan to test FlyBase data analyses over TeraGrid, Fall 2004.
 +
 
 +
(3) ARGOS, a replicable genome information system
 +
www.gmod.org/argos/  flybase.net/argos/  eugenes.org/argos/
 +
Version 0.7 (alpha, March 2004).
 +
ARGOS is used now for replicating public web-genome databases. Contains
 +
all of FlyBase, euGenes, wFleaBase, and some other services.
 +
Contents include 10 GB multi-genome data (euGenes), 8 GB of Drosophila
 +
(FlyBase), 500 MB common software, servers, binaries).
 +
 
 +
Miscellany:
 +
gmod/schema/XMLTools/ChadoSax/ reader  for chado.xml provides
 +
  flybase annotation data access.
 +
gmod/schema/GMODTools/  Perl modules using GMOD 0.001 release for
 +
  managing miscellany sequences (EST, GSS, etc) in Chado database
 +
  Used now in Daphnia / wFleaBase genome database (eugenes.org/daphnia)
 +
Apollo data search/retrieval system used at
 +
  flybase.net/apollo/
 +
  a web CGI using Chado Postgres + LuceGene
 +
  for retrieval Game XML annotations by
 +
  lookup of gene name, genome location, other attributes.
 +
Tested, aided development, and used GMOD release 0.001, Postgres Chado,
 +
XORT, Chado::DBI, GBrowse, etc. tools for FlyBase and wFleaBase, where
 +
they now form the basis of data management.
 +
 
 +
 
 +
 
 +
GMOD Update from the Saccharomyces Genome Database (SGD)
 +
 
 +
    Before the last GMOD meeting at Berkeley, SGD released several GMOD
 +
software packages (Blast Graphic Viewer, Restriction Graphic Viewer and
 +
GO Graphic Viewer). Since then, we have been working on incorporating
 +
existing GMOD products into new tools and resources at SGD. Here is a
 +
list of projects that are currently under development or already in
 +
production.
 +
 
 +
1. New Fungal BLAST using BLAST Graphic Viewer.
 +
    SGD has created a new Fungal BLAST interface using the BLAST Graphic
 +
Viewer. This new tool can be used to do BLASTN or TBLASTN searches using
 +
any sequence of choice against any combination of fungal sequence datasets,
 +
including genome sequences of fungal model organisms and pathogens, ESTs,
 +
and other fungal sequence sets in GenBank. The fungal BLAST search at SGD
 +
can be accessed from this URL.
 +
 
 +
    http://seq.yeastgenome.org/cgi-bin/SGD/nph-blast-fungal.pl
 +
 
 +
 
 +
2. GBrowse at SGD
 +
    GBrowse has been set up at SGD. SGD is still testing the software
 +
before making a general announcement about the availability of the
 +
software.  This software is running on top of a MySQL database whose
 +
tables are populated from a flat file in GFF3 format (refer to the third
 +
topic for detail). GBrowse at SGD can be accessed from this URL.
 +
 
 +
    http://www.yeastgenome.org/cgi-bin/SGD/gbrowse/gbrowse/yeast
 +
 
 +
3. GFF3 file format
 +
    SGD has started to provide the sequence features of S. cerevisiae
 +
genome in a flat file, which is fully compatible with GFF3 format.
 +
This file is used as the data input to load the MySQL database for
 +
GBrowse and the PostgreSQL database running Chado schema for SGD Lite
 +
at Princeton. This file is updated every week on SGD's ftp site. This
 +
file is available for download from this URL.
 +
 
 +
    ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/SGDGFF3.gff
 +
 
 +
 
 +
4. SGD Lite and CHADO
 +
    The SGD colony at Princeton has been working on installing GMOD
 +
release 0.002.  Both versions of the Chado schema in these releases
 +
(.001 and .002) have been successfully installed and loaded (via a
 +
modified GFF3 file) on a desktop running Mac OS 10.3.2 using the
 +
included installation scripts.  We are currently working on installing
 +
0.002, including GBrowse, on an Apple X server running 10.3.2.  We plan
 +
to assemble installation notes/documentation and distribute them during
 +
the meeting.
 +
 
 +
5. Textpresso Beta testing
 +
    SGD has a wealth of literature information. We want to provide
 +
expanded text searching to our users, since we have an abstract and/or
 +
full text for most of our references. Textpresso is an information
 +
retrieval system developed by Wormbase at Caltech. Eimear Kenny spent
 +
two weeks at SGD to help set up a test version of Textpresso. The SGD
 +
Textpresso can be accessed from this URL.
 +
 
 +
    http://www.yeastgenome.org/textpresso/
 +
 
 +
Currently, we are working on improving Textpresso's software
 +
performance, as well as developing a yeast version of the Textpresso
 +
ontology. We improved the performance of the markup script (text2xml.pl)
 +
by 50%. We are also considering a few options to improve the indexing
 +
mechanism. With regard to the ontology, we have modified the 'Gene'
 +
and 'Localization in Time and Space' categories.  We are also currently
 +
working on a few other categories, such as Allele, Transgene and
 +
Phenotype, in order to best reflect the biology in S. cerevisiae.
 +
</pre>
  
 
[[Category:Meetings]]
 
[[Category:Meetings]]

Revision as of 14:22, 6 April 2007

Generic Model Organism Database Construction Set

Meeting 4

GMOD Meeting April, 2004

Presentations


Agenda (including links to powerpoint presentations)

Progress reports

GMOD Progect Progress Reports
April, 2004
-----------------------------

The past four months have seen the first two releases of gmod, which will
become the suite of model organism database software.  The first release, 
version 0.001 (alpha), was release in January, 2004.  The main goal of that
release was to establish a release procedure.  The release consisted of a
database schema, referred to as chado, which is the database schema
developed primarily by FlyBase developers at Harvard and BDGP.  Additionally,
there were a variety of tools for installing and loading data into the
database which were developed primarily by Allen Day at UCLA and Scott
Cain at CSHL.  Finally, there was a compatible version of the Generic
Genome Browser with a chado database adaptor developed to allow browsing
of genome features directly from the database.

The second release, also an alpha release, consisted of the same components,
and was release in March, 2004. In this release, the installation procedure
improved considerably, and a prerequisite that had caused testers difficulties
was removed.  During the GMOD meeting in April, this release was installed 
by several attendees during a workshop.  Several suggestions were made that
will be implemented in the next release.

There are several items planned for addition or improvement in the next
two releases.  Tools to allow importing and exporting XML formatted data
from chado will be included, which will allow the sequence annotation tool,
Apollo, to be used with chado. Addtionally, template based web front end for
chado called turnkey will be included in an upcoming release.  This software is 
still early in the development process, but when it was presented to
developers at the GMOD meeting in April, there was considerable interest
in getting it included in a gmod release as soon as possible.

Longer term goals for gmod releases are including pubsearch and pubfetch.
The process of porting these applications has begun and is expected to be
complete by the end of the year.  A tool for liturature based sequence
annotation, called JavaSEAN, is expected to be included in gmod in a similar
time frame.  Additionally, there are plans from the Apollo developers to
create a new version of Apollo that will be able to read and write directly
to the database without using an XML intermidary, which will simplify the
process of sequence annotation considerably.



Apollo Progress Report (11/2003 - 4/2004)

Major improvements in release 1.3.6 (11/3/03):

Apollo now runs under JDK1.4, which works better on most platforms.

Can rubberband a region on the axis and the selected sequence will pop up
in a Sequence window.

Results that represent hits against sequences that are new to their
respective database (as indicated in tiers file) are shown with a box
around them, so that the curator can immediately see which results are
new and need to be looked at.

Search (Find) now allows full regexps.

Instead of having the config files in $HOME/.apollo be slightly modified
copies of the ones in APOLLO_ROOT/conf, you can now put ONLY the stuff
you want changed into your personal cfg files.  Apollo will first read
the ones in APOLLO_ROOT/conf, and then read your personal cfgs and apply
any modifications.

Synteny (see Synteny section at end)


Major improvements in release 1.4.0 (internal release) (2/9/2004):

New game.tiers file format (easier to read and change).  If you have an
old game.tiers, it will be autoconverted to the new format.

Better handling of non-gene annotation types.  New glyphs for showing
them in main Apollo display.
New annotations are automatically assigned the type (e.g. gene, tRNA,
etc.) appropriate to the evidence that was used to create them.  (Type
can then be changed in the annotation info editor, if desired.)

Structured transaction records are now added to the XML when you save.
They include the type of object that changed (e.g. TRANSCRIPT;
ANNOTATION; COMMENT), the operation (e.g. ADD, SPLIT, etc.), the relevant
names and/or IDs before and after the transaction, and the user and
time/date when the change was made.

Support for translational exceptions, including frame shifts and one base
pair genomic sequencing errors.

UTRs are now shown in a different (configurable) color from the rest of
the gene.

Restriction enzyme mapper:
- Cut sites show up in main window (near the axis)
- Can now map multiple restriction enzymes at once
- Table of restriction fragments; can be selected for viewing in
Sequence window

Annotation info window:
- Now has integrated annotation tree
- Shows arbitrary properties for annotations and transcripts (including
validation_flag)
- Shows translational exceptions and genomic sequencing errors
- Lets you edit annotation ID as well as name/symbol

Ability to tag results by selecting from a list of comments, which are
specified (as ResultTags) in game.style.  Tagged results are crosshatched
in pink in the display.

Fixed updating of peptide sequences.


Improvements in releases 1.4.1 (3/12/04) and 1.4.2 (3/18/04):

Red/green markers at axis show where sequence/region ends.

To help you identify splice sites that are unconventional, colored
triangles appear in the annotation glyph.

Can now load D. melanogaster data from r3.1 (gadfly) and r3.2 (chado)
(both via cgi).


1.4.3 (4/19/04):
Let users get the sequence of the entire segment you're looking at, not 
just a rubberbanded section.  [File -> Save sequence]



Synteny progress, 11/03-4/04:

- Synteny now works with GAME. You can load one species and then use the
blast or syntenic block results to another species (for now it's pseudo)
to load another species. The other species is loaded with the same range
around that feature. Links between the two species are automatically
derived from the blast link features that are present in both datasets
(no explicit link file needs to be specified).

- Database chooser was added to select the different species databases.

- Able to switch back and forth from synteny data adapter to regular data
adapters without restarting Apollo.

- Can save and edit (edit could use some rigorous testing)

- Can home in on link from link popup menu. Zooms and shows the strands
of homed in link, strands not in link are hidden.

- Species now zoom and scroll together by default. Can unlock zoom with
shift key, and unlock scroll with menu item.

- You can now config links between 2 curation sets that contain links to
each other. link_type, source and hit species are specified in the linked
type in the tiers file. This works with game, in theory could be made to
work with other adapters that have linked data embedded in the species
data.



Textpresso: A progress report

Eimear Kenny, Hans-Michael Mueller and Paul Sternberg

Updates made to Textpresso since September 2003:

Textpresso for Yeast Literature
(Toward a generic MOD information retrieval/extraction search engine)

SGD developers and curators met with Eimear Kenny for two weeks at
the begining of March at Stanford to build a Textpresso search engine for
Yeast. During that period the Textpresso software was installed on a
Solaris system and three builds with a test corpus of ~400 full text
journal articles were completed. In addition, the Textpresso Ontology for
worm literature was modified to a functional preliminary ontology for
yeast literature. Plans to expand the corpus to 10,000 yeast papers and
make improvements to the yeast ontology are underway at Stanford.

Integration of Textpresso into Literature Curation Pipeline

We have integrated Textpresso to the Wormbase curation pipeline
to expediate the extraction of genetic interaction information from the
literature. A prototype curation interface has been developed to enable a
curator to extract data from sentences returned by a Textpresso query for
genetic interaction. We find that these Textpresso sentences are enriched
3-fold for gene-gene interactions compared to sentences that mention two
or more gene names and 39-fold compared to random sentences from the
literature.

Textpresso MOD interface

We have generated a Wormbase-like interface for Textpresso to integrate
the Textpresso information retrieval engine in the Wormbase web-site.
http://www.textpresso.org/cgi-bin/wb/textpressoforwormbase.cgi?allabstracts=on&searchmode=sentence&searchtargets=Paper&searchtargets=Abstract

Textpresso Package

Hans-Michael Mueller is working on packaging Textpresso for release in
the first half of this year.

Textpresso paper ... under review

A Textpresso publication is currently under revision.



PubFetch/PubTrack Progress Report (April 2004)

PubFetch
PubFetch is a tool for accessing literature from various online resources.
The goal is to provide a common interface and common format to downstream
applications to allow them to query different literature repositories in
a single, unified fashion.

PubFetch has been implemented in two forms:
* Java servlet core + simple web interface to provide interactive access
to PubFetch
    * Provides access to PubMed and Agricola databases
* BioMOBY wrapper around servlet core to provide webservice access to PubFetch

A variety of new features have been introduced:
* Duplicate filtering - running the same search on multiple data sources
results in some duplication of articles, the duplicate filter detects
these articles returning a non-redundant set of data. Database Ids from
both sources are maintained in the non-redundant set.
* The web interface version highlights keywords in the search results to
aid in review of the returned articles.
* Connection to full text - a hyperlink to the full text is returned (if
available from PubMed)
* Filtering of 'ahead of print' articles - Abstracts are appearing in
PubMed and being assigned PubMed Ids prior to being published and are
being reassigned PubMed Ids after publication. PubFetch allows filtering
of these ahead of print articles to retrieve only published articles.

The BioMOBY interface provides the following services:
* SearchPubmed - Search PubMed for given query and get PMIDs 
* GetPubmed - Retrieve PubMed articles in MEDLINE display format for given
PMIDs
* FetchFull - Get FullText for given PMID 
* fetchAgID - Search Agricola for given query and get Agricola accession number
* fetchAgDoc - Get Agricola document in MEDLINE like format for given
Agricola accession number

Current work
The integration of PubFetch and PubSearch is in progress, our goal is to
have PubSearch using the PubFetch core module for literature retrieval by
summer of 2004. We will be adapting the Rat Genome Database literature
pipeline to use the PubFetch BioMOBY services to act as its source for
literature data download.

The current version of PubFetch is available from the GMOD cvs:
http://cvs.sourceforge.net/viewcvs.py/gmod/pubfetch/

Implementation of PubSearch at RGD
Following a curator review of existing PubSearch functionality, a variety
of new features were requested by the RGD curators to enable a more
'article-centric' view of the PubSearch database. This has been
implemented by the TAIR group and plans are underway to install this
latest version of PubSearch at RGD, populate with RGD/Rat data and test
in the RGD curation process.

PubTrack
PubTrack is a monitoring tool that tracks objects as they move through
a process or workflow. Existing workflow tools move data through a
specified process, passing datasets to applications and retrieving
results and passing them to the next step in the flow. PubTrack does
not aim to direct or control workflow and it does not track the dataset
as a whole, it provides a higher resolution and tracks the data objects
within the dataset, enabling users to follow a particular object as it
moves through a process.

Progress to date:
* Review of existing workflow tools and schemas has been completed.
* The initial PubTrack schema has been developed and implemented in PostgreSQL
* Initialization scripts have been written to populate the PubTrack
database with initial object and process data. Perl scripts are used to
parse and load initialization data in a standard XML format; a DTD is
available and is used to confirm the data formatting.
* An API is under development to allow 3rd party applications to
communicate with PubTrack to initialize and update the tracking
information for objects under observation. This is being developed and
tested using data from a proteomics MS/MS analysis pipeline that is
being built in my lab.
* A basic web user interface is in development to provide end-users with
the ability to view objects and their progress through their designated
processes.
* The concept of 'estimated time of completion' has been added to allow
long term planning and project tracking. For example, the entire process
of curating an article might typically take 3 days, so the estimated time
of completion would be 3 days after the start of curation. This estimate
can be displayed on a Gantt chart and updated as individual steps in the
process are completed, allowing an increasingly refined view of the
completion date. This is being used in our proteomics tracking - component
1 generates tissue samples from animals in a process that takes upto 3
weeks to complete. By tracking the progress and updating the completion
time estimate using PubTrack it allows lab members in component 2 to plan
ahead. They are able to see what samples will be ready and on what date
they will be ready and this is updated as the process progresses.

Current Work
When the API is stabilized we will deploy PubTrack in the existing RGD
literature curation pipeline and ultimately in combination with PubSearch
at RGD. This will create an entire system allowing tracking of literature
across a heterogeneous system as it is downloaded from PubMed, into
PubSearch, screened, moved to RGD's Oracle db, curated and ultimately
filed. A more comprehensive user interface will be developed based on the
experiences from the proteomics pipeline and the RGD curation pipeline.
The goal is to provide generic tracking views and a way to allow specific
users to customize the displays, charts and reports if needed.

PubTrack documents including schema, loading scripts, etc. can be found on
the GMOD CVS.
http://cvs.sourceforge.net/viewcvs.py/gmod/pubtrack/



PubSearch update

We've migrated our database schema over to one that should be more
compatible with a Chado schema --- all of our table names are now prefixed
with a 'pub_' prefix, and we've done some column renaming so that we use
consistant names throughout the system.

Our production server has been also upgraded from MySQL3 to MySQL4, and
we've rewritten some parts of Pubsearch to take advantage of the
transaction support that the new MySQL provides.  We've also added
referential integrity constraints to the foreign keys in our tables.

We've adopted another tool called JCoverage to help us identify areas of
our code that are not being touched by our unit cases, and have started to
tighten up our test cases so that our major classes are being exercised.

We've worked toward removing dependencies on external resources.  Hit
generation now works directly from the Java codebase, rather than from an
external Python script.  We've continued work on a keyword term browser to
replaced the highly munged version of AmiGO that we are running locally.



GBrowse Project

Coordinator: Lincoln Stein
Major Developers: Scott Cain
	          Aaron Mackey
		  Toshiaki Katayama
		  Vsevolod Ilyushchenko
		  Marc Logghe
		  Sheldon McKay
		  Mark Wilkinson

DESCRIPTION: 

GBrowse is a web-based browser for genome annotations.  It is intended to 
complement Apollo by providing a search, browse and drill-down display for 
sequence-based features without the need for prior software installation.  
GBrowse uses a database adaptor system to connect to a single primary data 
source, and a temporary flat-file system to layer an arbitrary number of 
third-party annotations on top of the primary data.  A plugin system is used 
to add new functionality to gbrowse, such as more advanced searches, and 
dynamically-computed features such as ab initio gene predictions.  An 
internationalization layer allows GBrowse to display button labels, menus and 
help text in a variety of common world languages.

The following gbrowse database adaptors currently exist:

      Bio::DB::GFF (oracle, postgresql & mysql)
       Well-tested and in production.

      Bio::DB::Das::Chado (postgresql)
       Well-tested and in early production.

      GenBank proxy
       Well-tested and in production.  Does not handle
       full-genbank keyword searches properly.

      Bio::DB::Das::BioSQL
       Adaptor for the BioSQL schema.  In beta test. 

      Bio::Das 
       Adaptor for DAS sources. Released, but probably best
       considered in beta test.

GBrowse has been downloaded from SourceForge 1,830 times, but this is
a poor way to count the number of GBrowse users.  A more conservative
estimate of users comes from tallying bug reports, which ensures that
the user has at least tried to install the software.  However, it
represents an undercount.  In any case, we can confirm that at least
100 laboratories have installed GBrowse.  As the list attached to the
bottom of this report shows, GBrowse can be found in academic,
governmental and commercial organizations in North America, South
America, Europe, Asia, Africa and Australia.

RECENT PROGRESS:

Since the last status report, we have added the following features to
GBrowse:

1) SVG output

Users can now click on a link labeled "Publication Quality Image" and
download a Scaleable Vector Graphics version of the current view.  SVG 
is an editable format that can be manipulated with popular graphics
programs such as Adobe Illustrator, and can be reprinted by journals
without the pixelation that plagues bitmapped images.

2) Security

Tracks can now be protected by username & password, restricted to
certain hosts, or limited to hosts presenting certain classes of RSA
(digital) certificates.  A restricted track does not appear on the
screen of unauthorized users, allowing system administrations to
present a mix of proprietary and public data.

3) DAS support

GBrowse can now run on top of distributed annotation system sources.
DAS is supported in three ways:
    a) As an external annotation source
       Users can layer remote DAS tracks on top of the current view.
       The remote DAS tracks will remain active from session to
       session.  The GBrowse administrator can preconfigure a set
       of "recommended" DAS sources, which will then appear in a
       user-selectable menu.

    b) As a primary database
       GBrowse can now be configured to use a local or remote DAS
       database as its primary data source.  This means that one
       can point GBrowse at the UCSC or ENSEMBL databases and 
       immediately begin browing them using the GBrowse user
       interface.

    c) As a DAS source
       GBrowse will act as a DAS server.  At the administrator's
       discretion, all or selected tracks can be made exportable
       via DAS, allowing sequence features be shared between
       GBrowse instances or between GBrowse and other DAS clients.

4) Feature filtering and highlighting

A new filtering and highlighting API allows plugins to hide features
based on a set of user-supplied criteria or to highlight them in
various colors.

5) New adaptors

In addition to the DAS adaptor, we have added an experimental BioSQL
adaptor to GBrowse.  BioSQL is a flexible database schema designed by
the BioPerl & BioJava projects for the purposes of holding
GenBank/EMBL records in a relational format.

6) Support for GFF3 loading & dumping

GBrowse now can load and dump sequence annotations in GFF3 format
(http://song.sourceforge.net), a preliminary specification that
improves on the current GFF sequence feature format.  The advantage of 
this format is that it uses the Sequence Ontology, a controlled
vocabulary of sequence feature types.

7) Integrated MOBY support

The BioMOBY system (www.biomoby.org) is a web services system that
allows users to quickly locate and invoke bioinformatics services.
GBrowse now has an interface which allows it to find services that
will operate on selected sequence features.  For example, GBrowse can
present users with a list of current services that will operate on
Drosophila gene names.

8) Support for writeback

A writeback layer has been added to GBrowse to allow external editors
to update the underlying database.  This has been tested successfully
with the Artemis editor in the context of a USDA pathogens database
project.  Testing with Apollo is still underway.  Currently it is
recommended to edit sequence databases via the shared Chado schema and
the Apollo->Chado->GBrowse route, rather than to use Apollo->GBrowse
directly.

9) New glyphs

We have recently added a number of new glyphs for use with the
International HapMap Project.  New glyphs include a "weighted allele"
glyph that indicates the major and minor alleles of a single
nucleotide polymorphism, and a set of glyphs for visualizing haplotype
blocks.

10) Bug fixes

Performance has been improved when uploading large 3d party annotation
files.  Nucleotide-level alignments have been fixed when the display
is "flipped."  The feature name search methods have been cleaned up to 
provide more consistent behavior.

PLANS FOR THE FUTURE:

Performance is a concern when viewing large numbers of uploaded
third-party features. We plan to fix this by implementing a indexed
flat file cache for uploaded features.

The user interface needs to be improved in some respects.  One useful
idea is to place an icon to the left of each track to indicate whether
it is in a expanded or collapsed state.

The ability to use a different DAS source for each track, which is a
feature of ISB GBrowse, will be ported over.

As always, we are looking for volunteers fluent in non-English
languages to create and update the internationalization files.

Contact: Lincoln Stein <lstein@cshl.org>

APPENDIX. Confirmed users of GBrowse:

		Agricultural Biotechnology Center, Hungary
		BAWI, S. Korea
		Baylor College of Medicine
		Biocrates GmbH, Innsbruck
		Brandeis University
		Bristol-Meyers Squibb
		British Columbia Centre for Diseaes Control			
		CIRAD, France
		CSIRO, Australia
		Cambridge University (multiple labs)
		Center for Genomics & Bioinformatics, Stockholm
		Center for Genomics and Bioinformatics, Stockholm
		Centre de Genetique Moleculaire, CNRS
		Cold Spring Harbor Laboratory (multiple labs)
		Compugen
		Concordia University, Canada
		Cornell Medical School
		Cornell University
		DNA Landmarks, Inc.
		Donald Danforth Plant Sciences Center
		Duke University (multiple labs)
		EMBL, Heidelberg
		EuGenes (hacked copy)
		Faculdade de Medicina de Ribeiro Preto, So Paulo
		FlyBase
		Foundation for Research and Technology, Crete
		Fundao Hemocentro, Sao Paolo
		Genoscope, France
		GrainGenes
		Harvard University
		Hospital for Sick Kids, Toronto
		Illinois Institute of Technology
		Incyte Corporation
		Inpharmatica, Ltd.
		Institute for Systems Biology, Seattle
		Institute of Molecular and Cell Biology, Singapore
		International Rice Research Institute, Phillipines
		John Innes Centre
		KEGG
		Kansas State University
		Karolinska Institute
		Kennedy Krieger Institute
		Lawrence Berkeley Laboratories
		Marine Biological Laboratories, Woods Hole
		Massachusetts Institute of Technology (multiple labs)\
		Mayo Institute
		McGill University
		Meat Animal Research Center, University of Nebraska
		Medical University of South Carolina
		Michigan State University	  
		NHGRI, NIH
		National Cancer Institute, Frederick Cancer Center
		New York University (multiple labs)
		North Carolina State University
		Northern Illinois University
		Northwestern University
		Oklahoma State University
		Open Informatics Consulting Corp.
		Oxagen Corp.
		Pasteur Institute, Paris
		Pioneer Corporation
		QIAGEN Operon Corp.
		RIKEN (multiple labs)
		RatDB
		Regulome, Inc.
		Rhobio (Bayer CropScience SA & Biogemma joint venture)
		Rigshospitalet, Copenhagen
		Rockefeller University
		Roslin Institute, Edinburgh
		Russian Academy Medical Sciences
		Serono International Corp, Geneva
		Simon Frasier University
		South Africa National Bioinformatics Institute
		Southern Illinois University
		St. Jude Children's Research Hospital, Memphis
		Stowers Institute for Medical Research
		Texas A&M (multiple labs)
		The Institute for Genome Research
		Tulane University
		Tulane University
		University California Davis
		University of Arizona (multiple labs)
		University of British Columbia
		University of California Santa Barbara
		University of Georgia (multiple labs)
		University of Minnesota
		University of Muenster
		University of New South Wales, Australia
		University of Oklahoma (multiple labs)
		University of Pennsylvania (multiple labs)
		University of Southern California
		University of Texas
		University of Toronto
		University of Virginia
		University of Washington
		Universitt Giessen
		Universit de Lige, Belgium
		Wageningen Universiteit & Researchcentrum, Netherlands
		Washington University at St. Louis (multiple labs)
		WormBase
                deVGen, Belgium


CMAP
Main developer:		Ken Clark

Recent improvements include:

*   Now CGI-based (no more mod_perl dependencies), making installation
    much easier (and much more like Gbrowse)
*   Added SVG output
*   Added multiple aliases for features
*   Added support for arbitrary attributes for db objects
*   New cross-reference scheme allows for unlimited xrefs on most db objects
*   Experimental XML export/import of data added
*   User tutorial added
*   Faster, fewer bugs, etc.

CMAP is known to be in use by:

Barry Marler (Andy Paterson), Alex Feltus, Pratt: UGA
Rex Nelson, Chet Langin, Xiaokang Pan: Iowa State
Michelle Bobo: Oregon Health & Science University
Victor Ulat, Richard Bruskiewich: IRRI
Matthew Hobbs: University of Sydney (Australia)



                          Pathway Tools Status Report
                                  Peter Karp
                                February 5, 2004

Please note that the full history of updates to Pathway Tools can be
found at URL
http://bioinformatics.ai.sri.com/ptools/release-notes.html

Significant updates funded under this grant since the last report are
as follows.

o We have implemented the proposed Napster-like peer-to-peer sharing
of Pathway/Genome Databases via a central network registry server.
Pathway Tools users will be able to use the software to register new
PGDBs that they create to this central registry server at SRI, and
they will be able to use the software to browse the registry and
to retrieve and install PGDBs listed there for local analysis.

o Pathway Tools has been extended to support annotation of protein
domains, sites, and chemical modifications.  We have created an
ontology of domain, sites, and modification types.  The Pathway/Genome
Editor tools have been extended to allow users to interactively
annotate these features on protein sequences, and the Pathway/Genome
Navigator has been extended to display these annotated features.

o We have added a batch-processing mode to the portion of Pathway Tools
that creates new Pathway/Genome Databases to allow large-scale automated
processing of multiple genomes without manual intervention.  We have
undertaken a collaboration with the European Bioinformatics Institute,
who are interested in applying Pathway Tools to generate Pathway/Genome
Databases for a large number of genomes.

o We have integrated an algorithm for pathway hole filling into
Pathway Tools.  A pathway hole is a reaction step in a metabolic
pathway for which no enzyme has been identified in the genome of
an organism.  The pathway hole filler uses a combination of techniques
to predict which genes in the genome code for these missing enzymes.
[This algorithm developed under separate funding.]

o We have completely re-designed the menus of the desktop version
of Pathway/Genome Navigator to be more consistent with other
graphical interfaces, more intuitive to the user, and to provide
more screen area to display of visualizations.

o We have integrated an SBML (Systems Biology Markup Language) output
tool written in the Church lab at Harvard into Pathway Tools, allowing
the reaction network within a Pathway/Genome Database to be exported
to SBML format, from which it can be imported into a number of
simulation and analysis software packages.

o We have reworked the display of information about protein complexes
within Pathway Tools to increase the clarity of this information.

o The preceding capabilities will be present in the February release
of Pathway Tools.

o We have received many emails from users reporting bugs, and asking for
information.

o 80 groups have licensed Pathway Tools to date.

o Pathway/Genome Databases available through the web include:

   o Saccharomyces cerevisiae, Stanford University
     http://pathway.yeastgenome.org/biocyc/

   o Plasmodium falciparum, Stanford University
     plasmocyc.stanford.edu

   o Mycobacterium tuberculosis, Stanford University
     BioCyc.org

   o Arabidopsis thaliana and Synechosistis, Carnegie Institution of Washington
     Arabidopsis.org:1555

   o Methanococcus janaschii, EBI
     Maine.ebi.ac.uk:1555   (availability intermittent)


                          Pathway Tools Status Report
                                  Peter Karp
                                April 20, 2004

Please note that the full history of updates to Pathway Tools can be
found at URL
http://bioinformatics.ai.sri.com/ptools/release-notes.html

Significant updates funded under this grant since the last report in
February 2004 are as follows.

o Version 8.0 of Pathway Tools was released on March 12, 2004.
SRI continues to hold to our planned schedule of two releases of
Pathway Tools per year.

o 275 groups have licensed Pathway Tools to date.  The large jump
in this number since the last report reflects the fact that these
numbers also include groups who use Pathway Tools to query
existing Pathway/Genome Databases (not reported earlier), in addition
to groups who use it to create new databases.

o We have made very significant progress on development of an
algorithm to automatically lay out the one-page metabolic overview
diagram that shows the full metabolic network of an organism -- the
algorithm is now working.  We are also in the process of adding new
components of the cellular machinery to this diagram.

o SRI has hosted two 4-day training sessions for Pathway Tools.
The dates and 26 attendees are listed below.  Most attendees have
brought genomes with them to the training sessions, and have left
with draft Pathway/Genome Databases.

Tutorial on March 15-18, 2004

1. John Burke		  Biotique Inc.
2. Guillaume Meurice	  Pasteur Institute
3. David Simon		  Pasteur Institute
4. Gregory P. Fournier	  MIT
5. Alex Picone		  Biatech
6. John Bashkin		  SRI
7. Tit Yee wong		  University of Memphis
8. Ken Kaufman		  UC Berkeley
9. Jeremy Glasner	  University of Wisconsin
10. Lisa Herron-Olson	  University of Minnesota
11. Devaki Bhaya	  Carnegie Institution


Tutorial on April 19-22, 2004

1	Dr. Matthew Berriman	The Wellcome Trust Sanger Institute
	T. brucei & L. Major
2	Herbert Chiang		Washington University
	Bacteroides thetaiotaomicron
3	Clinton Fernandez	University of British Columbia
	Rhodococcus sp. RHA1 (~10MB)
4	Lisa Koski		University of Montreal, Canada	
5	Rebecca Krupp		UCLA			
	Methanosarcina acetivorans
6	Joanne Luciano		BioPathways Consortium
	Prochlorococcus marinus MED4
7	Jasintha Maniraja	Universite Libre de Bruxelles
	Mus musculus
8	Linyong Mao		Pacific Northwest National Laboratory
	Shewanella oneidensis
9	Michael P. McLeod	University of British Columbia
	Rhodococcus sp. RHA1 (~10MB)
10	Dylan Morris		CalTech	
	Mycoplasma genitalium
11	Gavin Murphy		CalTech	
	Bdellovibrio
12	Joo-Heon Park		University of Tex-Houston Med School
	Treponema pallidum
13	Liviu Popescu		Cornell University, Computer Science
	Sacaromyces cerevisae
14	Christopher Reigstad	Washington University
	unpublished uropathogenic E. coli
15	Haluk Resat		Pacific Northwest National Laboratory
16	Jian Song		Los Alamos National Laboratory	
	Pseudomonas aeruginosa



GMOD Project Status     April 2004        D. Gilbert (gilbertd@indiana.edu)

Project members:  Don Gilbert, Josh Goodman, Paul Poole,
Vasanth Singan (student), at Indiana University.

Projects in development for GMOD:

(1) LuceGene, document/object search/retrieval for genome data
www.gmod.org/lucegene/   eugenes.org:8081/gmod/lucegene/ 
version 1.2 (alpha), released for public use April 2004.
In use at FlyBase.net, euGenes.org, wFleaBase. LuceGene is similar in
concept to the bioinformatic databank access tool SRS, and web search
systems such as Google. Based on Lucene, this Java program is fast and
flexible at search and retrieval of complex data objects.  It
outperforms Chado Postgres database by 10x or more at gene object
retrieval.

(2) Genome Directory System, data mining access to genome data  
www.gmod.org/gds/
In development, web services for SOAP access to genome data and bio
sequence databanks.  Plan to provide production data mining services
through this including FlyBase, euGenes genomes and Bio-Mirror/IUBio
biosequence databanks. Will add to ARGOS package for genome databases.
Includes plan to test FlyBase data analyses over TeraGrid, Fall 2004.

(3) ARGOS, a replicable genome information system
www.gmod.org/argos/  flybase.net/argos/  eugenes.org/argos/
Version 0.7 (alpha, March 2004).
ARGOS is used now for replicating public web-genome databases. Contains
all of FlyBase, euGenes, wFleaBase, and some other services.
Contents include 10 GB multi-genome data (euGenes), 8 GB of Drosophila
(FlyBase), 500 MB common software, servers, binaries).

Miscellany:
gmod/schema/XMLTools/ChadoSax/ reader  for chado.xml provides
  flybase annotation data access.
gmod/schema/GMODTools/  Perl modules using GMOD 0.001 release for
   managing miscellany sequences (EST, GSS, etc) in Chado database
   Used now in Daphnia / wFleaBase genome database (eugenes.org/daphnia)
Apollo data search/retrieval system used at 
   flybase.net/apollo/
   a web CGI using Chado Postgres + LuceGene
   for retrieval Game XML annotations by
   lookup of gene name, genome location, other attributes.
Tested, aided development, and used GMOD release 0.001, Postgres Chado,
XORT, Chado::DBI, GBrowse, etc. tools for FlyBase and wFleaBase, where
they now form the basis of data management.



GMOD Update from the Saccharomyces Genome Database (SGD)

    Before the last GMOD meeting at Berkeley, SGD released several GMOD
software packages (Blast Graphic Viewer, Restriction Graphic Viewer and
GO Graphic Viewer). Since then, we have been working on incorporating
existing GMOD products into new tools and resources at SGD. Here is a
list of projects that are currently under development or already in
production.

1. New Fungal BLAST using BLAST Graphic Viewer.
    SGD has created a new Fungal BLAST interface using the BLAST Graphic
Viewer. This new tool can be used to do BLASTN or TBLASTN searches using
any sequence of choice against any combination of fungal sequence datasets,
including genome sequences of fungal model organisms and pathogens, ESTs,
and other fungal sequence sets in GenBank. The fungal BLAST search at SGD
can be accessed from this URL.

    http://seq.yeastgenome.org/cgi-bin/SGD/nph-blast-fungal.pl


2. GBrowse at SGD
    GBrowse has been set up at SGD. SGD is still testing the software
before making a general announcement about the availability of the
software.  This software is running on top of a MySQL database whose
tables are populated from a flat file in GFF3 format (refer to the third
topic for detail). GBrowse at SGD can be accessed from this URL.

    http://www.yeastgenome.org/cgi-bin/SGD/gbrowse/gbrowse/yeast

3. GFF3 file format
    SGD has started to provide the sequence features of S. cerevisiae
genome in a flat file, which is fully compatible with GFF3 format.
This file is used as the data input to load the MySQL database for
GBrowse and the PostgreSQL database running Chado schema for SGD Lite
at Princeton. This file is updated every week on SGD's ftp site. This
file is available for download from this URL. 

    ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/SGDGFF3.gff


4. SGD Lite and CHADO
    The SGD colony at Princeton has been working on installing GMOD
release 0.002.  Both versions of the Chado schema in these releases
(.001 and .002) have been successfully installed and loaded (via a
modified GFF3 file) on a desktop running Mac OS 10.3.2 using the
included installation scripts.  We are currently working on installing
0.002, including GBrowse, on an Apple X server running 10.3.2.  We plan
to assemble installation notes/documentation and distribute them during
the meeting.

5. Textpresso Beta testing
    SGD has a wealth of literature information. We want to provide
expanded text searching to our users, since we have an abstract and/or
full text for most of our references. Textpresso is an information
retrieval system developed by Wormbase at Caltech. Eimear Kenny spent
two weeks at SGD to help set up a test version of Textpresso. The SGD
Textpresso can be accessed from this URL.

    http://www.yeastgenome.org/textpresso/

Currently, we are working on improving Textpresso's software
performance, as well as developing a yeast version of the Textpresso
ontology. We improved the performance of the markup script (text2xml.pl)
by 50%. We are also considering a few options to improve the indexing
mechanism. With regard to the ontology, we have modified the 'Gene'
and 'Localization in Time and Space' categories.  We are also currently
working on a few other categories, such as Allele, Transgene and
Phenotype, in order to best reflect the biology in S. cerevisiae.