January 2009 GMOD Meeting
| January 2009 GMOD Meeting
January 15-16, 2009
Following PAG 2009
San Diego, California, USA
- 1 Agenda
- 2 Themes and Discussions
- 3 Presentations
- 3.1 The State of GMOD
- 3.2 GMOD Help Desk
- 3.3 A RESTful Interface for MODs?
- 3.4 Data Representation in Chado: Best Practices
- 3.5 Generic Gene Page XML
- 3.6 BioMart
- 3.7 JBrowse
- 3.8 GMODWeb and Package Management
- 3.9 EcoliWiki and TableEdit
- 3.10 WebGBrowse: GBrowse Configuration Management
- 3.11 Drupal and MarineGenomics.org
- 3.12 Artemis and Chado at GeneDB
- 3.13 Chado and GUS at SBRI
- 3.14 modENCODE: extending Chado, BIR-TAB, & GBrowse for automating data validation & display
- 3.15 BeeSpace
- 3.16 Metadata Input and Submission Tool and GIS linked metagenomic database
- 3.17 Bovine Genome Database
- 3.18 GNPAnnot
- 4 Registration
- 5 Agenda Proposals
- 6 Meeting Participants
- 7 Feedback
- 8 Next Meeting: August 2009 at Oxford
Thursday, January 15
|10:30 AM||Introductions||Scott Cain|
|11:00 AM||The State of GMOD||Scott Cain||PPT, Summary|
|11:30 AM||A variety of GMOD Help Desk stuff||Dave Clements||PDF, Summary|
|12:00 PM||Lunch||one hour 30 minutes|
|1:30 PM||Drupal and MarineGenomics.org||Stephen Ficklin||PDF, Summary|
|2:00 PM||Artemis and Chado at GeneDB||Robin Houston||PPT, PDF, Summary|
|2:30 PM||modENCODE: extending Chado, BIR-TAB, & GBrowse for automating data validation & display||Nicole Washington||PDF, Summary|
|3:30 PM||A RESTful interface for MODs?||Josh Goodman||PPT, PDF, Summary, Discussion|
|4:00 PM||Metadata Input and Submission Tool and GIS linked metagenomic database||Iddo Friedberg and Christopher Condit||PDF, Summary|
|4:30 PM||Data Representation in Chado: Best Practices||Joshua Orvis and/or Scott Cain||Summary, Discussion|
|5:00 PM||Dinner (on your own)|
Friday, January 16
|9:00 AM||Chado and GUS at SBRI||Dhileep Sivam and Isabelle Phan||PPT, PDF, Summary, Discussion|
|9:30 AM||BioMart||Arek Kasprzyk||PDF, Summary|
|10:00 AM||BeeSpace||Barry Sanders, Dave Arcoleo||PPT, Summary|
|11:00 AM||WebGBrowse GBrowse configuration management, Summary||Ram Podicheti||PPT, PDF, Summary|
|11:30 AM||JBrowse (aka GBrowse 3.0)||Mitch Skinner||ODP, PDF, Summary|
|12:00 PM||Lunch||one hour 30 minutes|
|1:30 PM||EcoliWiki and TableEdit||Daniel Renfro||I tried, but all I get is errors. |
And PowerPoint makes a 10MB pdf,
which is way too big to upload.
Contact me if you want a copy., Summary
|2:00 PM||Generic Gene Page XML||Scott Cain||PPT, PDF, Summary, Discussion|
|2:30 PM||GMODWeb and package management||Brian O'Connor||PPT, PDF, Summary|
|3:00 - 5:00 PM||MIGS and MIMS||Iddo Friedberg||Summary|
|Bovine Genome Database||Justin Reese and Chris Childers||PPT, PDF, Summary|
|GNPAnnot||Pierre Larmande||ChadoControler, Summary|
Themes and Discussions
Several themes ran throughout the meeting
Several presentations touched on this:
- A RESTful Interface for MODs?
- Data Representation in Chado: Best Practices
- Generic Gene Page XML
- Chado and GUS at SBRI
A common question during these talks was how much should we do? Should we implement a comprehensive data sharing protocol or start with very modest goals, or should we aim for a sweet spot in the middle? Should we emphasize robustness or ease of implementation? Should GMOD support semantic web efforts?
The semantic web in general and SSWAP in particular was discussed. Ren Nelson of SoyBase pointed out that SoyBase's map data is now available through SSWAP. RDF, the Bio2RDF projct, and the Swoogle semantic web search engine were also mentioned.
Josh Goodman, Rob Buells, Rex Nelson, and Kevin Clancy formed the Web services working group and will continue and expand this discussion with the GMOD community.
Joshua Orvis's "Data Representation in Chado: Best Practices" session dealt with the same issue, this time in the context of representing biology within Chado in the same way across organizations. In this session we proposed converging on common representations by having organizations post their current Chado practices to the wiki, discussing them on the wiki or on the GMOD-Schema mailing list, and then converging on a common set of Chado best practices. Common practices would enable both data sharing and common tools. Joshua got the ball rolling by describing IGS's Chado practices on the IGS Data Representation page.
RDF got additional discussion during Dhileep Sivam and Isabelle Phan's session on Chado and GUS at SBRI. Uniprot uses RDF to represent their data. XML gives you a tree representation, while RDF gives you a graph of RDF files. Graphs more often better reflect what is being described. Sparql is the standard query language for accessing RDF.
The presentations are listed here in a very approximate order:
- GMOD Project Presentations
- GMOD Components
- GMOD User Experiences
The State of GMOD
- GBrowse 1.69 released
- Apollo 1.9.? released
- CMap 1.01 released
- Generic Gene Page released
- GMOD 1.1 (Chado)
- Bio::Graphics split from BioPerl
- Both for releases and source code repository.
- As soon as BioPerl 1.6 is released, installing Bio::Graphics from CPAN should work.
Howard Hughes Medical Institute Science Education Alliance
A program set up by the Science Education Alliance (SEA) staff of the Howard Hughes Medical Institute (HHMI), in close collaboration with Ed Lee of Lawrence Berkeley National Laboratory, uses GMOD Components to teach sequencing from sample collection to annotation to submission to GenBank. College freshmen at 12 schools isolate and then sequence mycobacterium smegmatis phages, manage the sequence with Chado, annotate it with Apollo, and display it with GBrowse. At the end of the course, each school submits a newly sequenced and annotated phage genome to GenBank. Matt Conte, the SEA bioinformatics specialist at the time the workflow was developed and implemented, attended the 2008 GMOD Summer School.
- 1.69 released in August. Lots of new stuff
- Popups (from Sheldon McKay)
- Vertical dragging of tracks
- Rubber banding (also Sheldon)
- Quantitative data (Wiggle tracks)
- Conservation data
- Track sharing
- Galaxy integeration
- Parallelizable track data sources and rendering.
- User interface changes also make it feel quicker, even when running on a single CPU
- Tracks rendered as soon as they are done.
- What was GBrowse 3.0 is now JBrowse - see below.
GMOD Help Desk
The 2008 GMOD Community Survey was conducted in October. Dave summarized the result of the survey.
2009 will be the year that natural diversity support becomes an integral part of GMOD,
- Dave will polish the natural diversity and fold it into production Chado. This was originally written for HeliconiusDB and is based on GDPDM, a data model used by Gramene and MaizeGenetics. This work may involve tweaking existing modules like Phenotype.
- Ben Faga is working on adding geolocation visualization to GBrowse
- Dave is involved in a NESCent working group with the goal of extending GMOD to better support evolutionary and ecological research.
- Dave is hoping to organize a hackathon later this year to specifically address natural diversity needs in GMOD.
Some upcoming meetings and courses:
- The 2009 North American GMOD Summer School at NESCent. The date isn't set for this yet.
- 2009 European GMOD Week
See below for more.
Documentation and Web Site:
- A GBrowse user tutorial will be released by OpenHelix later this month. This is a Flash based tutorial that goes into great detail on how to use GBrowse's basic and advanced user features.
- GMOD for High-throughput sequencing
- Dave will be putting time into how GMOD supports this and outreach to the community that is doing this.
- Web site upgrades
- Went to MediaWiki 1.12 in August 2008. Got better search ability and uniform URLs.
- Planning to upgrade to 1.13 or 1.14 this spring, and maybe give the site a new look.
And some delayed tasks, that Dave talked about at the July 2008 GMOD Meeting:
- Chado Documentation Reorganization - not yet
- User Directory - not yet
- GBrowse doc- getting there
- Community Annotation System doc - not yet
- GMOD Logo Service - still trying
Finally, Dave solicited feedback on what he should be doing. Some highlights:
- Document what parts of the GFF3 standard GBrowse can't currently deal with.
- Document what variables are available in GBrowse callbacks.
A RESTful Interface for MODs?
Josh Goodman of the FlyBase project proposed that GMOD support a RESTful interface for biological data. Josh started by listing many of the APIs that already exist in GMOD or at GMOD user sites, and then asked the question "Why have another one?" Here's why:
- None of the existing APIs are compatible with each other.
- Use different data models
- Some assume the Chado data model
- and not everyone uses Chado, nor will they
- Some assume a particular language like Java or Perl.
Why go with REST?
- If adoption costs are high, people won't use a technology
- REST tends to have a low cost of adoption
- It's language neutral
What should a GMOD REST implementation have?
- Simple and lightweight (low cost of adoption)
- URL based
- Versioned URLs for stability
- Versions refer to format/API, not to data.
- Data model neutral (that is, don't assume Chado
- Result lists in XML or JSON
- Gene records in Generic Gene Page XML
Data Representation in Chado: Best Practices
The same data can be stored in Chado in wildly different ways. There are few commonly agreed on or documented best practices. Common practices are vital for both data and tool sharing.
An example undocumented/unestablished best practice would describe how to version features in Chado. Should you use naming conventions? Should the Chado Audit Module be used for this. Either approach (or others) may be viable, but without a standard any one groups's implementation is likely to be incompatible with every other group's implementation.
This led to two questions:
Q: How do changes to the Chado schema happen?
Q: How do we establish Chado best/common practices?
The general response here was
- Ask organizations to describe what they currently do, and how they think things should be represented.
- Joshua created a page, IGS Data Representation, that describes how IGS uses Chado. This page is an excellent template for other organizations to use as a starting point for describing their practices.
- Use these documents as discussion points for converging on a common set of best practices.
Dave pointed out that once we have standards, a Chado validator could be written to report any data in a Chado database that does not conform to best practices.
Generic Gene Page XML
There are now at least three servers that generate GMOD Generic Gene Page XML:
The Perl implementation has 11 or so abstract classes that need to be implemented, and the new() method also needs to be overridden.
The XML generated by this package does not conform to Chado XML, It also does not share tags with NCBI's Gene XML. It is very close to Uniprot XML. Column 9 attributes in GFF becomes a comments with a type of the attribute tag name (e.g., "note").
We don't currently have an XML-Schema or documentation on this format. The Perl package is what currently defines it. We also don't have any code that consumes this XML. BioPerl code could be written that would eat the XML and produce a Bio::SeqFeature object.
We discussed adding sequence to the XML, but did not reach a conclusion on this.
See Data Sharing above for more on this topic.
- BioMart, Arek Kasprzyk
BioMart can handle very large datasets and can be configured as a centralized data warehouse or as a federated one. BioMart also enables you to hide your own mart while federating with others. Arek expects a lot of future demand for BioMart to be an inhouse bioinformatics protal.
BioMart is being used by the International Cancer Genome Consortium, which expects to do 50,000 human genomes. With this amount of data you have to use the federated data model. They are using OpenID for authentication and have support for versioning.
Three key concepts in BioMart are datasets, filters, and actions. BioMart provides makes its data available through a web interface, and Perl API, and web services. The three core concepts are used in all of those interfaces.
In the past, BioMart has only supported Chado as a data source to feed BioMart (and a problematic one - see below). However, could also use BioMart's interfaces as a data source for many GMOD components such as GBrowse, CMap, Galaxy, etc. BioMart is interested in pursuing this and Arek floated the idea of a hackathon to help achieve this.
Arek described the current Chado to BioMart mapping as challenging because of the extensive cross-referencing in the schema. This makes Chado very flexible, but it makes it difficult for BioMart to tease out relationships in the data.
Arek was asked to summarize differences between BioMart and InterMine, another similar GMOD component. There are many differences but two key ones are how optimization is done and the data model that each uses.
JBrowse can load data from a number of data sources including GFF files and Bio::DasI implementations like Bio::DB::GFF (GFF2) and Bio::DB::SeqFeature::Store (GFF3) databases. It then translates them into nested containment lists in JSON format.
This strategy has number of benefits:
- using pre-generated JSON means no CGI needed for browsing
- Thus, easier installation, and
- Scale to large genome and large numbers of users.
- JSON is also cached by web browsers, making it fast.
The JBrowse configuration file also uses JSON syntax.
Several new features have been added since the last report at the July 2008 GMOD Meeting:
- Name/ID searching
- Names and IDs are stored in JSON using a Trie structure (a string prefix tree) and subtrees are loaded lazily.
- Quantitative tracks
- Subfeature support
A development version of JBrowse is available for download.
GMODWeb and Package Management
- GMODWeb and Package Management, Brian O'Connor
GMODWeb is a GMOD component for generating web sites that are driven by a Chado database. It is based on Turnkey. Turnkey takes a schema and turns it into a website. GMODWeb adds several GMOD specific widgets that add things like support for GBrowse. Brian divided GMODWeb into two distinct phases. First, generating the website. Second, load it into Apache and mod_perl and bring that web site up. Users (see Stephen Ficklin's talk below for an example) generally have little trouble with the first step. However, in the second step users often find themselves in Perl dependency hell. GMODWeb has over 100 Perl dependencies and user success at getting them all lined up correctly depends on a tenacious GMOD Systems Administrator.
Can we do anything about this? Perhaps if we had package management support ...
The second part of Brian's talk was about BioPackages.net, a repository of biology software RPMs for the CentOS (based on Red Hat Enterprise Linux) and Fedora (also part of the Red Hat suite) Linux distributions. RPMs are software packages that clearly specify what other RPMs they depend on. They greatly simplify software management.
BioPackages.net was created by and is currently managed by the Nelson Lab at UCLA. It has a build farm backing it that is used to generate the packages. They would like to see the BioPackages build farm be replicated somewhere else. Build farms have different virtual servers for each version of Linux that is supported. BioPackages currently has a rich, but somewhat dated, library of CentOS4 RPM packages. They are now transitioning to CentOS5, and are currently packaging Chado 1.0 releases, as well as a DAS/2 reference server.
RPMs, while good on paper, have a number of implementation challenges. It is rare that you can find all your needed software in RPM format. Sometimes you have to install from CPAN, or from source, and as soon as that happens the RPM infrastructure starts to lose its integrity. One way around this is to use virtual machines. Currently working on a prototype CentOS4 machine with Chado 1.0, recent BioPerl, and Turnkey/GMODWeb 1.4 installed. After this meeting Dave Clements is going to UCLA for two days to learn about RPM generations and the BioPackages.net infrastructure.
Brian would like people to contact him ...
- Turnkey/GMODWeb: looking to expand Java producer to eliminate Perl dependency problem
- BioPackages: looking for RPM developers (or Debian package builders for Ubuntu)
- Virtual Machines: looking to create CentOS5 machines
- Pre-configured GMOD demo/dev kit
- Pre-configured Biopackages dev kit
- Anyone who is using GMOD tools for Next Gen Sequencing (Dave C would also like to know if you are doing this)
EcoliWiki and TableEdit
EcoliWiki is a wiki for the E. coli community. One goal of wikis is to enable any interested user to make small updates and corrections to the web site. Wikis enable users to easily enter plain text and they support simplified markup languages to enable users to do some basic formatting like bolding or italicization without having to learn the more complex (and none too intuitive) HTML or CSS markup. Wikis enable making the unit of submission very small.
However, even simplified wiki markup for tables is, at its best tedious, and at its worst impenetrable. TableEdit is an extension to the MediaWiki wiki package that protects users from dealing with MediaWiki markup. In addition TableEdit also supports templates so the same table format can be reused in multiple places in a wiki (e.g., on every gene page as on EcoliWiki) and integration with backend databases, such as Chado.
Recent work on TableEdit includes:
- Added a button for insert TableEdit text.
- Templates can now pull data from a table on another page.
- Better documentation.
- Support for really big tables.
- Conflict detection (simultaneous updates)
- help links 100% editable.
- Bulk loader
- Uses IFALT format.
Work continues on TableEdit and version 2 is expected to have support Chado round-tripping. That is, data from Chado can be displayed in TableEdit tables in a wiki, and you can use the TableEdit wiki interface to update data in a Chado database.
Iddo asked about uploading excel spreadsheets. This function is not currently planned.
WebGBrowse: GBrowse Configuration Management
- WebGBrowse GBrowse configuration management, Ram Podicheti
Ram Podicheti spoke on WebGBrowse, a new web interface for configuring and hosting GBrowse instances that was developed at Indiana University.
WebGBrowse was developed to ease GBrowse configuration. One reason for the popularity of GBrowse is the wide range of glyphs it supports, and the configurbility of those glyphs. However, most glyphs are not well documented and some are not documented at all. When documentation does exist, it is aimed at Perl programmers, and to learn all the options supported by each glyph you need to look at the Perl code. GBrowse is now being used in smaller organizations that may not have personnel with relevant experience.
The aim of WebGBrowse is to make GBrowse available to biologists without the installation or support costs. WebGBrowse allows users to upload their own GFF3 datasets, and to use a web interface to to configure GBrowse. It curently supports configuration options for 42 glyphs. The information about and options for each glyph is stored in YAML.
WebGBrowse is available both at IU and as downloadable software.
The next release of WebGBrowse will add more functionality:
- Support uploading of GFF3 as tar balls
- Expand the glyph library
- Allow of loading of pre-existing configuration files and start from there.
- Support GENERAL section configuration
- Balloons, plugsin, etc
- Allow group feature configuration
- Categorize the glyphs
- Perl callbacks.
In a discussion afterward on how best to document glyphs, Rob Buells (I think) suggested having the glyphs be self-documenting. That is, Bio::Graphics::Glyph would be extended to make it possible to ask a glyph what it can do and what options it supports. This facility could be used by WebGBrowse to learn about glyphs, or to automatically produces wiki documentation for each glyph, or by any other program that cares.
Rob will look into this.
Drupal and MarineGenomics.org
- MarineGenomics.org - Drupal & Chado
- Fagaceae Genomics Web - GMODWeb & Chado
- CoralMicroges.org - Drupal & Chado
- Genome Database for Rosaceae (GDR) - converting to Chado
CUGI chose Drupal because:
- Quicker development
- Easy user contribution
- well documented.
- large user community
- easy to customize look and feel
- social networking abilities.
- Menus, node, and blocks for organization
- PHP, CSS, JQuery and AJAX for user interface.
- Not using Drupal's Content Construction Kit (CCK). Wrote their own.
- Using Drupal's search capability.
- Drupal's taxonomy categorization.
- Embed GBrowse in an IFrame in Drupal
Drupal and Chado
Kept Chado and Drupal schemas separate, but Drupal needs to know what is in the Chado database. Implemented Chado_feature node in Drupal. Correlates feature node with feature ID. Provide forms for updating Chado
Need to synchronize Drupal and Chado databases. Some data (GFF) is added to Chado first and then copied to Drupal, while other data (EST pipeline results) are added to Drupal first and then copied to Chado.
They have large putative data sets and their BLAST results are not stored in Chado - they are just too big. Instead they use XML formatted resutls for each feature and then Drupal indexes this. Stored in filesystem based on DB ID and feature ID.
Artemis and Chado at GeneDB
- Artemis and Chado at GeneDB, Robin Houston
GeneDB is a core part of the Sanger Institute Pathogen Sequencing Unit. GeneDB currently has data on 50+ pathogens and expects that number to grow by orders of magnitude in the coming years. GeneDB curators use Artemis to do manual genome annotation. (Artemis serves the same purpose as Apollo, and like Apollo is also implemented in Java.)
GeneDB is currently in the process of moving their data to Chado. GeneDB is also upgrading its web site to pull data from Chado indirectly. Data will be read from Chado and cached in serialized Java object in a BerkeleyDB database. This approach results in a very responsive (as in darn near instantaneous) web site. The new web site will launch in the first half of 2009.
GeneDB developed a Hibernate mapping for Chado. The feature hierarchy is represented using single table inheritance.
Chado and GUS at SBRI
Chado is used in the SSGCID project to store Nimblegen microarray data. Challenges in working with this data include normalization, scaling, feature level aggregations, remapping and visualization. SBRI has a pipeline for discovering protein structure that looks at 60 different resources. Dhileep spends a lot of time writing scripts to parse Nimblegen data, tools for BLAST searches, and scripts to export data from Chado.
This work has led SBRI to use and extend Chado in several new ways:
- Complexity of querying BLAST searches and microarray data.
- Use materialized views for both
- Grouping of genes
- Use DBXREFs
- Gene Models
- Use the simplest possible model.
SBRI uses Chado and GUS for different purposes. Chado is used (in collaboration with IGS) to store annotation from Apollo and Manatee, and the results of Ergatis workflows. It is used to manage internal data production. GUS (in collaboration with UPenn) powers the web front end and is used for external data access. SBRI uses Chado because of its data model, and GUS because of its strong software engineerin and flexibility.
SBRI, like many others, would like to have standardized object-relational mappings (ORMs) for mapping biological data to Chado. They want RDBMS-free data mining. Use BioMart and Galaxy to do this, rather than Chado and GUS. (Isabelle commented that it took 5 minutes to install Galaxy.) Could also use RDF in combination with a triple store (RDF represents everything with triples) plus Lucene. Want the ability to take what you need from Chado (as little as 6 tables) and map it to ORMs. (See the Data Sharing discussion above for more on this.)
modENCODE: extending Chado, BIR-TAB, & GBrowse for automating data validation & display
- modENCODE: extending Chado, BIR-TAB, & GBrowse for automating data validation & display, Nicole Washington
The modENCODE project is working to identify all functional elements and find evidence for every gene prediction in worm and fly. They were originally using ChIP-chip, but have now switched to ChIP-Seq technology.
Nicole works at the modENCODE Data Coordination Center (DCC). The DCC is a central collection and validation point for modENCODE's many data providers. They also provide project statistics. The DCC uses Chado, GBrowse, and InterMine. In addition they are also developing many new tools.
They do extensive data and metadata validation before loading both into Chado. They link between metadata and the resulting features and have added methods to add and drop data easily. Protocols have been formalized in BIR-TAB, which is based on MAGE. Protocol inputs and outputs are typed, but internals of the protocol are a black box. This was added to Chado with a custom protocol extension.
modENCODE uses GBrowse 2.0 for visualization. The backing database is very large and they have added methods to easily add and drop datasets. They would like to be able to use PostgreSQL for storing Bio::SeqFeature::Store (GFF3) databases.
modENCODE also does track finding. Chado is scanned looking for features that belong together and should be shown in the same track. Uses Heuristics to group things together. This can produce GFF3 or Wiggle or both. Track finding code is written in Ruby
Finally, they have a submission and publishing pipeline that is written with Ruby on Rails. It also uses the GoogleGraph API.
Many of these tools are available at the modENCODE BIR_TAB svn repository:
- BeeSpace, Barry Sanders, Dave Arcoleo
BeeSpace now supports interactive, iterative collection builing and automatic collection version management. Currently collection sharing is BeeSpace's only social feature, but they plan to add more in the future.
Gene Summarizer is a part of the BeeSpace software suite that does automated curation of papers. It attempts to mimic what human annotators do. In BeeSpace it currently analyzes only abstracts. Gene Summarizer can analyze full text, but currently does not because of licensing issues, not technical issues.
Gene Summarizer accepts text as input and produces complex output in JSON format. It is available both as a command line program and as a web application.
In the future BeeSpace plans to add gene ontology search tools and provide analysis and clustering of their data. They will also add set operation support to their collections.
- Metadata Input and Submission Tool and GIS linked metagenomic database, Iddo Friedberg and Christopher Condit
Iddo and Christopher presented their work on the CAMERA project. CAMERA is a metagenomics and Iddo and Chris's aim was to have the GMOD community consider how GMOD can help metagenomics projects with their data, and how metagenomics projects might expand what GMOD can do.
Metagenomics involves sequencing whole communities or organisms, usually take from a spatial sample like a cube of clay, a liter of water, or an area of the human gut. In metagenomics you frequently don't know what organism a DNA fragment came from. The data is huge, noisy, and partial. Metadata is also key: microbes are enormously affected by their habitat and you need to store as much environmental data as possible.
CAMERA uses a number of data standards:
- MIGS/MIMS (Minimum Information about a (Meta)Genomic Sequences, both sponsored by the Genomic Standards Consortium (GSC)
- Data is coded in the Genomic Contextual Data Markup Langauge (GCDML), also from the GSC.
- Environment Ontology (EnvO), PATO
- Common Access to Biological Resources and Information (CABRI)
Things to think about:
- How do we look at ”disembodied” sequence data?
- ”Fragment recruitment” track
- Visualization of sequence data <--> metadata associations
- Database: association of metadata and sequence data; queries by metadata
Iddo closed by emphasizing that the new high-throughput technologies change everything and that we need to start thinking about the challenges that come with that right now.
Christopher Condit showed some ways to visualize metadata associated with each sample, and general climatic data for context. Chris used the NASA MODIS data to show both global averages and data for a specific day, down to 4km square resolution.
Bovine Genome Database
Bovine Genome Database, Justin Reese and Chris Childers
Justin and Chris presented the architecture and workkflow of the Bovine Genome Database.
- Use two Chado databases, one "main Chado" to hold semi-stable data (SNPs, ESTs, protein alignments, gene calls) and one "incoming Chado" to hold incoming annotations
- One GBrowse MySQL database (to serve GBrowse)
- Flat files to serve BLAST
- Annotation system
- ~400 or so annotators, ~4,000 or so annotations so far
- annotators pulls directly from Chado
- no Apollo writebacks to chado yet, users save annotations in Apollo as Chado XML, and upload to our servers via user management/upload CGI scripts they wrote
- curators curate incoming annotations, resolve conflicts, submit to NCBI
- will periodically synchronize annotations between incoming Chado and main Chado on an ongoing basis
- Glean used to generate a non-redundant set of gene calls from a handful of different automated gene calls (e.g. Fgenesh, GeneMark, etc). This simplifies community annotation by providing an easy starting point for community annotators (many times they can just correct minor errors given Glean gene model for errors and promote to a manual annotation)
ChadoControler, Pierre Larmande
GNPAnnot is a project on green genomics which intends to develop a system of structural and functional annotation supported by comparative genomics and dedicated to plant and bio-aggressor genomes allowing both automatic predictions and manual curations of genomic objects. Four community annotation systems are released on three sites: monocots (CIRAD / Bioversity at Montpellier), insects (INRA at Rennes), fungi (BIOGER at Versailles) and wheat / grapevine (URGI at Versailles).
They are evaluating the GMOD synteny viewers.
However, the off the shelf Chado is missing several features that they need:
- Access privileges
- Revision History (would the Chado Audit Module work?)
- Coordination of concurrent access
- Client compatibility checks
- Network security
Finally, they are developing a MVC architecture in their website.
If you have something you want to be on the agenda at this meeting please add it below.
- Natural Diversity Module in Chado -- Clements 14:23, 13 November 2008 (UTC)
- Using the 2008 GMOD Community Survey for guidance -- Clements 21:46, 20 November 2008 (UTC)
- Common data representation (best practices) --Jorvis 23:29, 20 November 2008 (UTC)
- Including high throughput sequencing data --Scott 18:07, 8 December 2008 (UTC)
- 2009 GMOD Summer School, in Durham North Carolina, and possibly in Oxford, UK. Clements 19:50, 24 November 2008 (UTC)
- Beespace Navigator 4.0 and the Gene Summarizer automatic gene curation engine. -- Barry Sanders, 03 December 2008
- Common Gene Page XML format. Status update and discussion of future directions. --Jogoodma 18:38, 11 December 2008 (UTC)
|Saravanaraj Ayyampalayam||University of Georgia|
|Hugo Berube||National Research Council Canada|
|Ramesh Buyyarapu||Alabama A&M University|
|Scott Cain||GMOD, Ontario Institute for Cancer Research (OICR)|
|Chris Childers||Georgetown University|
|Kevin Clancy||Life Technologies|
|Dave Clements||NESCent / GMOD|
|Christopher Condit||University of California San Diego|
|Stephen Ficklin||Clemson University Genomics Institute / MarineGenomics.org, Fagaceae.org|
|Iddo Friedberg||University of California San Diego|
|Dong He||CalTech, SpBase|
|Christopher Hemmerich||Indiana University - Center for Genomics and Bioinformatics|
|Ian Holmes||UC Berkeley|
|Robin Houston||Pathogen Informatics, Sanger Institute|
|Jim Hu||Texas A&M University/EcoliWiki and GONUTS|
|Ying Huang||University of California, San Diego|
|Andrei Kouranov||Protein Data Bank|
|Daniel Lang||University of Freiburg, cosmoss.org|
|Pierre Larmande||Joint Research Unit Plant Development and Genetic Improvement|
|Dorrie Main||Genome Database for Rosaceae|
|Weidong Mao||Virginia State Unviersity|
|Sheldon McKay||Cold Spring Harbor Laboratory iPlant/GMOD|
|Joshua Orvis||Institute for Genome Sciences|
|Georgios Pappas, Jr||EMBRAPA/Brazil|
|Ram Podicheti||Center for Genomics and Bioinformatics|
|Gowthaman Ramasamy||Seattle Biomedical Research Institue (SBRI), Seattle,WA|
|Robert Reed||U.C. Irvine|
|Justin Reese||Georgetown University, BeeBase and Bovine Genome Database|
|Peter Rose||UCSD - Protein Data Bank|
|Dhileep Sivam||University of Washington & Seattle Biomedical Research Institute (SBRI)|
|Mitch Skinner||UC Berkeley, JBrowse project|
|Weijia Su||Tyler Applied Systems, Inc.|
|Randall Svancara||Genome Database for Rosaceae|
|Adrian Tivey||Pathogen Informatics, Sanger Institute|
|Nicole Washington||LBNL, modENCODE, GBrowse, Phenote|
|John Westbrook||PSIKB / PDB|
|Geoff Winsor||Pseudomonas Genome Database (Simon Fraser University)|
|Andreas Zimmer||University of Freiburg/cosmoss.org|
Attendees were asked to fill out a one page evaluation of the meeting.
Q: Please rate the meeting(s) using the following scale: 1 (not at all) to 3 (reasonably) to 5 (exceptionally).
|How useful was the meeting?||0%||0%||0%||25%||75%|
|Was the meeting well run and organized?||0%||0%||0%||25%||75%|
Q: Was the meeting what you expected?
|Yes||Much better/more than expected.|
Q: Would you recommend GMOD meetings to others
Q: Do you have suggestions for improving GMOD meetings in the future?
- I think it is important to keep it at the same time as the PAG (or the ISMB) as it might be difficult for some to attend if its not related to some other big conference.
- Overall the meeting was pretty good and more useful for me than I expected. I do not think people would mind paying a small registration fee in order to help around with the food cost (and maybe have some croissant and such in the morning).
- I think speakers shouldn't be disturbed during their speech, it is better to question them after their presentation is done.
- Keep running them like this. Having the catered lunch on site was a huge plus for networking with other attendees, which is why we attend. Planned social events for the express purpose of mixing/networking would be a plus.
- I'd like to see more pure bench biologists there. It's unclear to me how we could accomplish this, but I think GMOD's success of failure will ultimately depend on the degree to which we are able to reach the rank and file (non-informatics) scientist.
- I think at this stage GMOD meetings should focus on a wide variety of subject matters and deal with people with a widely different levels of experience.
- Include a tutorial for first time users.
- More presentations on GBrowse (since it's the most popular?)
- Better networking would have been nice.
Additional feedback, suggestions, criticism, and praise.
- It was a goldmine.
- Thanks for helping to organize the meeting, we really got a lot out of it.
- The organizers really know how to keep things casual, and approachable.
- We were glad to see such large attendance. The seating arrangement worked out well for the presentation format. Proximity to PAG was a benefit. Looking forward to next year!
- I liked the fact that it was a mix between presentation and open discussion. This way presentation usually lead to interesting discussions related to it.
- Thanks again for a great meeting!
- I am not a GMOD user (yet) and I came to this meeting to learn about GMOD. Besides clear and good presentations exploring many facets of GMOD, the GMOD users and developers were very accommodating in explaining the more basic points of GMOD.
- I was really looking forward to it, and I ended up enjoying it immensely.
- I think one of the more important tasks that GMOD has to face is finding user interface tools that will allow biologists to comfortably interact with data stored in a Chado database.
- The meeting is more helpful for those who use GMOD before than first time users.
- I like the addition of the lunch. It made things smoother.