Difference between revisions of "Common Gene Page"

From GMOD
Jump to: navigation, search
(Common Gene Page Rationale : discussion notes)
m (Notes for Discussion 2008)
Line 24: Line 24:
  
 
In hopes there will be a lively discussion on this topic at the July 2008 GMOD meeting
 
In hopes there will be a lively discussion on this topic at the July 2008 GMOD meeting
here are some of thoughts.  I would like to attend, but instead will be later in the
+
here are some thoughts.  I would like to attend, but instead will be later in the
 
week at the ISMB 2008 Toronto meeting, and hope to hear some outcomes of this.
 
week at the ISMB 2008 Toronto meeting, and hope to hear some outcomes of this.
  
Line 31: Line 31:
 
same gene summary format for our many shared customers.
 
same gene summary format for our many shared customers.
 
I'd like to see an agreement among 2+ genome data providers to
 
I'd like to see an agreement among 2+ genome data providers to
actually produce and deploy a common gene report within the
+
actually produce and deploy a common gene report (or such data files)
coming year.
+
within the coming year.
  
 
There is a history in genome informatics of everyone doing
 
There is a history in genome informatics of everyone doing
 
their own thing across projects with common genome data and
 
their own thing across projects with common genome data and
common customer needs. Low expectations come from this and other GMOD
+
common customer needs. Some efforts do achieve common usage and consensus:
common goals/recommendations, e.g. simple, Standard URL for
+
GFF(3) format, GBrowse, Chado schema/db, Apollo annotator among others.
genome data.  Some efforts do achieve common usage and consensus:
+
GFF(3) format, GBrowse, Chado schema/db, Apollo annotator, others.
+
  
The common gene report concept to date is to provide consumers of genome
+
This common gene report concept to date is to provide consumers of genome
data with a common format, both for web display and for simple computing.
+
data with the format across projects, both for web display and for simple computing.
 
It is aimed at simple summaries of gene data, structured in a common
 
It is aimed at simple summaries of gene data, structured in a common
way across organisms, suitable bioscientists and students to read and use  
+
way for many organisms, suitable bioscientists and students to read and use
as web pages and data files (XML) and do simple computing on if desired.
+
as web pages and data files (XML) and do simple computing on if desired.  
 
+
 
One can see it as alternate option to a MOD project's full,
 
One can see it as alternate option to a MOD project's full,
 
project-specific documents.  It isn't aimed at full, complex data
 
project-specific documents.  It isn't aimed at full, complex data
 
exchange among databases. Other formats/methods exist for that.
 
exchange among databases. Other formats/methods exist for that.
 
+
 
 
Although there are engineering details for implementing this
 
Although there are engineering details for implementing this
for any project, this isn't likely to be more than a small effort.
+
for any project, this isn't likely to be a large effort.
 
We were able to use simple web-page scraping software to convert
 
We were able to use simple web-page scraping software to convert
 
existing MOD gene reports into a common format (see
 
existing MOD gene reports into a common format (see
 
http://eugenes.org/gmod/gene-report-examples/)
 
http://eugenes.org/gmod/gene-report-examples/)
  
User-interface and web page design/display aspects can be tuned
+
User-interface and web page design aspects can be tuned
 
to each MOD's desires.  The main thrust is of a common gene page
 
to each MOD's desires.  The main thrust is of a common gene page
 
is having common data labelled in a similar way. Agreement on an XML notation
 
is having common data labelled in a similar way. Agreement on an XML notation
 
should follow in a straightforward way from common data fields.
 
should follow in a straightforward way from common data fields.
 
+
I (dgg) will be happy to work on this with any group of MODs who agree to deploy
I (dgg) will be happy to work with any group of 2+ MODs agreeing to deploy
+
a common gene report.  Prior software and example UGP-XML cases can
a common gene report.  The software and example UGP-XML cases can
+
be adapted to help with this.
be adapted to help this. Background at http://gmod.org/Common_Gene_Page
+
  
 
=Example uses=
 
=Example uses=

Revision as of 17:30, 14 July 2008

Common Gene Page Rationale

Model organism/genome databases (MODs) produce gene pages of similar gene data, and may benefit from looking at unifying these to common structure, labelling, etc.

A list of common gene page attributes

* Names, symbols/IDs, synonyms
* Map locations
* Sequences
* Reagents
* Gene ontology
* Similar Genes
* Database cross-refs, External links
* Alleles, Transcripts
* Proteins, Structure and Domains
* Expression and Mutant Phenotypes
* Gene Interactions
* Literature references
* Summary Text

Notes for Discussion 2008

From Dongilbert 13:15, 14 July 2008 (EDT) :

In hopes there will be a lively discussion on this topic at the July 2008 GMOD meeting here are some thoughts. I would like to attend, but instead will be later in the week at the ISMB 2008 Toronto meeting, and hope to hear some outcomes of this.

It seems to me the only real issue in moving forward with a common gene page, is how to convince MOD projects to adopt the same gene summary format for our many shared customers. I'd like to see an agreement among 2+ genome data providers to actually produce and deploy a common gene report (or such data files) within the coming year.

There is a history in genome informatics of everyone doing their own thing across projects with common genome data and common customer needs. Some efforts do achieve common usage and consensus: GFF(3) format, GBrowse, Chado schema/db, Apollo annotator among others.

This common gene report concept to date is to provide consumers of genome data with the format across projects, both for web display and for simple computing. It is aimed at simple summaries of gene data, structured in a common way for many organisms, suitable bioscientists and students to read and use as web pages and data files (XML) and do simple computing on if desired. One can see it as alternate option to a MOD project's full, project-specific documents. It isn't aimed at full, complex data exchange among databases. Other formats/methods exist for that.

Although there are engineering details for implementing this for any project, this isn't likely to be a large effort. We were able to use simple web-page scraping software to convert existing MOD gene reports into a common format (see http://eugenes.org/gmod/gene-report-examples/)

User-interface and web page design aspects can be tuned to each MOD's desires. The main thrust is of a common gene page is having common data labelled in a similar way. Agreement on an XML notation should follow in a straightforward way from common data fields. I (dgg) will be happy to work on this with any group of MODs who agree to deploy a common gene report. Prior software and example UGP-XML cases can be adapted to help with this.

Example uses

Early documents and samples

See this folder for some discussion, documents and examples for MOD gene pages from 2004: http://eugenes.org/gmod/gene-report-examples/

More discussion and samples

See this blog entry on a 2005 meeting disccussion, http://blog.gmod.org/common_gene_pages

Daphnia genome database use case

There is an implementation of how this can be used at Daphnia-base, where the gene reports are structured XML, with a style sheet to display. For example, see this gene page, http://wfleabase.org/lucegene/lookup?id=NCBI_GNO_292134 (view the page source to see structured gene page XML). Or see these screen shots daphnia gene page and gene page xml.

There is a simple perl tool to turn annotated GFF data into this gene page XML, suitable for search and display, in GMOD genepages in CVS or http://eugenes.org/gmod/gene-report-examples/ for bin/gff2ugpxml.pl

Search and display is then provided by the GMOD LuceGene tool, detailed at LuceGene_for_Daphnia_genome.