Chado for Prokaryotes
This page was started as a stub for producing a set of best practices for using Chado for prokaryotes.
Problems in adapting Chado for prokaryotes
The major challenges in adapting chado for prokaryotes are:
- Representation of features as Sequence Ontology types
- Circular genomes
- "Pan genomes"
Representation of features
Chado features are instances of ontology nodes. Genome features should be instances of types in the Sequence Ontology (SO). However
- SO uses a eukaryotic definition of gene that is not well suited to bacterial genomes
- Feature relationships in SO are based on eukaryotic models
- Feature types not in SO
The ideal solution is to work with SO to modify the ontology to work with both prokaryotes and eukaryotes. This is an ongoing activity that prokaryotic groups can participate in, and where SO has requested help from experts in bacterial genetics and genomics. In the meantime, MODs need to decide how to deploy Chado.
One solution has been to ignore "gene" as a feature type.
Sequencing multiple isolates of the same bacterial species leads to the identification of new genes in each isolate, with the rate of new gene discovery declining as a power law. The "pan genome" is a phrase used by Tettelin et al. 2005 to describe the set of genes present in the union of all genomes of a bacterial species. How should this be represented this in Chado?