Difference between revisions of "GMOD Evo Hackathon Proposal"

From GMOD
Jump to: navigation, search
m (Motivation)
(Organization)
Line 38: Line 38:
 
== Organization ==
 
== Organization ==
  
'''Organizing Committee:''' [[User:Mckays|Sheldon McKay]], [[User:Hlapp|Hilmar Lapp]], [[User:NLWashington|Nicole Washington]], [[User:Scott|Scott Cain]], [[User:RobertBuels|Robert Buels]]
+
'''Organizing Committee:''' [[User:NLWashington|Nicole Washington]], [[User:Hlapp|Hilmar Lapp]], [[User:Mckays|Sheldon McKay]], [[User:Scott|Scott Cain]], [[User:RobertBuels|Robert Buels]]
  
 
'''Time & Venue:''' The hackathon is tentatively scheduled to take place June 7-11, 2010 at NESCent in Durham, North Carolina.
 
'''Time & Venue:''' The hackathon is tentatively scheduled to take place June 7-11, 2010 at NESCent in Durham, North Carolina.

Revision as of 08:55, 17 March 2010

NESCent Hackathon on GMOD Tools for Evolutionary Biology

Synopsis

The GMOD Evo Hackathon aims to bring together experts in evolutionary biology, software, and bioinformatics to design and implement enhancements for GMOD tools, improving their support for evolutionary biology.

Motivation

The GMOD project is a confederation of open-source projects developing software tools for storing, managing, curating, and publishing biological data. GMOD tools are used by many large and small biological databases, and increasingly by individual research labs, for the dissemination of the results of experimental research and curated knowledge. While these software tools provide a powerful and feature-rich basis for working with biological data, many GMOD tools still lack features needed to effectively support evolutionary biology.

The volume and diversity of public genome and transcriptome data is rapidly increasing, creating tempting opportunities for evolutionary and comparative analysis. GMOD provides two very important tools for working with this data: GBrowse for visualization and Chado for storage and indexing, and as a backend for analyses. Sequence Alignment Map (SAM) format has become the de-facto standard format for representing short-read genome alignments, but it still has only limited support in these tools. One objective of the proposed hackathon is to design and implement improvements to GMOD tools to give them excellent support for SAM data, particularly for cross-species alignments and views. TODO: add more about metadata support? We expect the improved analysis and storage tools resulting from this work to make cross-species comparative analysis of large-scale datasets much more accessible.

Phenotypic diversity data is also very useful for evolutionary studies. In-depth analysis of this data requires proper representation, handling, and storage: specific phenotypes, environmental conditions, population details, and other experimental metadata all must be tracked, and more importantly cross-referenced with known genomic and genetic information. Developers at this hackathon will work to add One of the best conceptual tools for representing this type of information in machine-readable form is ontologies, and GMOD's open-source Chado database schema is the most mature, flexible, and feature-rich storage engine for storing ontology-based data. However, it lacks specific support for evolutionary phenotype data or natural diversity data. Earlier this year, a working group was formed to work on the design of a new Natural Diversity module for Chado, and one of the objectives for this hackathon will be to finalize and integrate the group's work into the larger Chado schema. TODO: need to connect to evolutionary phenotype data, does the natural diversity module address that? RobertBuels 07:00, 17 March 2010 (UTC)

We are seeking NESCent's support and hosting for this event, since it has good facilities in a pleasant setting, as well as past experience hosting events of this nature.

Specific objectives

Organizers and participants have identified the following broad objectives for guiding work at the event. During the event, participants will refine and distill these into concrete implementation objectives.

  • Alignment metadata
  • Population diversity support for chado and associated application connectivity (a la GDPDM)
  • Evolutionary phenotype data in chado
  • GBrowse_syn compatibility with SAM data
  • Chado change management, release process, schema versioning

The hackathon concentrates on writing code. All code and documentation will be made available immediately and freely to the community under an OSI-approved open source license.

Subgroups

Participants will split into subgroups at the event. The composition and tasks of the subgroups will be guided by the overall objectives, but will otherwise emerge and be self-determined by the participants both prior to and at the event.

Participants

Participation will be arranged by invitation and by self-nomination followed by review. If you are interested in participating, please contact one of the organizers.

Participants List

Organization

Organizing Committee: Nicole Washington, Hilmar Lapp, Sheldon McKay, Scott Cain, Robert Buels

Time & Venue: The hackathon is tentatively scheduled to take place June 7-11, 2010 at NESCent in Durham, North Carolina.

Agenda: The agenda of the event will be posted here once developed by the participants.

Suggestions

  • add suggestions here as bullet points