- 1 Introduction
A Feature is a Sequence
Chado does not distinguish between a sequence and a sequence feature, on the theory that a feature is a piece of a sequence, and a piece of a sequence is a sequence. Both are represented as a row in the feature table.
Feature types are taken from the SO controlled vocabulary (see also Controlled Vocabulary section in this document). A selection of Chado-relevant types from SO are shown below:
We organised the tables into distinct modular components with tightly defined dependencies. This is recogised as good software engineering practice, it allows different software components to focus on the specific data compartments required. It allows for extensibility and schema evolution within specific modules without disrupting the rest of the schema. Finally, it allows for a mix and match approach - it is the authors' hope that the schema modules will be adopted by other model organism and bioinformatics groups; these groups may want to swap in their own table variants within specific modules, or add modules of their own.
- Audit - for database audits
- Companalysis - for data from computational analysis
- Contact - for people, groups, and organizations
- Controlled Vocabulary (cv) - for controlled vocabularies and ontologies
- Expression - for RNA and protein expresssion
- General - for identifiers
- Genetic - for genetic data and genotypes
- Library - for descriptions of molecular libraries
- Map - for maps without sequence
- Organism - for taxonomic data
- Phenotype - for phenotypic data
- Phylogeny - for organisms and phylogenetic trees
- Publication (pub) - for publications and references
- Sequence - for sequences and sequence features
- Stock - for specimens and biological collections
- WWW -
general: NO DEPENDENCIES organism: general pub: general cv: general pub sequence: cv general pub genetic sequence cv general pub expression: sequence cv general pub map: sequence cv general pub
Inter-module Linking Tables
These can be thought of as floating outside of the respective modules they bridge, although they are generally bundled with one or the other module.
1.1.1 Module System
Views can be thought of as virtual tables. They provide a powerful abstraction layer over the database. All views should be portable across all DBMSs
Views in chado are deﬁned on a per module basis. View deﬁnitions are maintained in the chado/modules/MODULE-NAME/views directory.
Included in the view directory are report views. These can usually be found in a ﬁle called chado/modules/MODULE-NAME/views/MODULE-NAME-report.sql
Collections of view deﬁnitions are bundled into packages, each package is a .sql ﬁle.
DBMS Functions in Chado are entirely optional.
Functions in chado are deﬁned on a per module basis. Function deﬁnitions are maintained in the chado/modules/MODULE-NAME/functions directory.
Collections of function definitions are bundled into packages. Each package comes with an interface descriptions and one or more implementations.
Function Interface Definitions
The interface descriptions are stored in a *.sqlapi file. The syntax used is a variant of SQL and is intended primarily as a consistent way of providing information for human, although it should be parseable by software.
Here is an example, taken from the top of the chado/modules/sequence/functions/subsequence.sqlapi package. This package provides basic subsequencing functions. It has dependencies on two other function packages, declared at the top of the file. The package declares multiple functions, only the first of which is show here, a function for extracting subsequences from the sequence of a feature.
<sql> IMPORT reverse_complement(TEXT) FROM 'sequtil'; IMPORT get_feature_relationship_type_id(TEXT) FROM 'sequence-cv-helper';
-- basic subsequencing functions --
DECLARE FUNCTION subsequence( srcfeature_id INT REFERENCES feature(feature_id), fmin INT, fmax INT, strandINT )
COMMENT ON FUNCTION subsequence(INT,INT,INT,INT) IS 'extracts a subsequence from a feature referenced by srcfeature_id, within the interbase boundaries determined by fmin and fmax, reverse complementing if strand = -1. The sequence can be DNA or AA. Strand must always by >0 for AA sequences'; </sql>
The goal is to provide implementations for different dialects of procedural SQL. Currently only PostgreSQL dialect is supported. The psql implementations are stored in *.plpgsql files.