Difference between revisions of "Chado Companalysis Module"

From GMOD
Jump to: navigation, search
m
m
Line 1: Line 1:
 
=Introduction=
 
=Introduction=
  
 +
{{NeedsEditing}}
  
 
=Tables=
 
=Tables=

Revision as of 16:10, 5 March 2007

Introduction

This page or section needs to be edited. Please help by editing this page to add your revisions or additions.

Tables

Table: analysis

An analysis is a particular type of a computational analysis; it may be a blast of one sequence against another, or an all by all blast, or a different kind of analysis altogether. It is a single unit of computation.

analysis Structure
F-Key Name Type Description
analysis_id serial PRIMARY KEY
name character varying(255)

A way of grouping analyses. This should be a handy short identifier that can help people find an analysis they want. For instance "tRNAscan", "cDNA", "FlyPep", "SwissProt", and it should not be assumed to be unique. For instance, there may be lots of separate analyses done against a cDNA database.
description text
program character varying(255) UNIQUE#1 NOT NULL

Program name, e.g. blastx, blastp, sim4, genscan.
programversion character varying(255) UNIQUE#1 NOT NULL

Version description, e.g. TBLASTX 2.0MP-WashU [09-Nov-2000].
algorithm character varying(255)

Algorithm name, e.g. blast.
sourcename character varying(255) UNIQUE#1

Source name, e.g. cDNA, SwissProt.
sourceversion character varying(255)
sourceuri text

This is an optional, permanent URL or URI for the source of the analysis. The idea is that someone could recreate the analysis directly by going to this URI and fetching the source data (e.g. the blast database, or the training model).
timeexecuted timestamp without time zone NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone

Tables referencing this one via Foreign Key Constraints:



Table: analysisfeature

Computational analyses generate features (e.g. Genscan generates transcripts and exons; sim4 alignments generate similarity/match features). analysisfeatures are stored using the feature table from the sequence module. The analysisfeature table is used to decorate these features, with analysis specific attributes. A feature is an analysisfeature if and only if there is a corresponding entry in the analysisfeature table. analysisfeatures will have two or more featureloc entries, with rank indicating query/subject

analysisfeature Structure
F-Key Name Type Description
analysisfeature_id serial PRIMARY KEY

feature

feature_id integer UNIQUE#1 NOT NULL

analysis

analysis_id integer UNIQUE#1 NOT NULL
rawscore double precision

This is the native score generated by the program; for example, the bitscore generated by blast, sim4 or genscan scores. One should not assume that high is necessarily better than low.
normscore double precision

This is the rawscore but semi-normalized. Complete normalization to allow comparison of features generated by different programs would be nice but too difficult. Instead the normalization should strive to enforce the following semantics: * normscores are floating point numbers >= 0, * high normscores are better than low one. For most programs, it would be sufficient to make the normscore the same as this rawscore, providing these semantics are satisfied.
significance double precision

This is some kind of expectation or probability metric, representing the probability that the analysis would appear randomly given the model. As such, any program or person querying this table can assume the following semantics: * 0 <= significance <= n, where n is a positive number, theoretically unbounded but unlikely to be more than 10 * low numbers are better than high numbers.
identity double precision

Percent identity between the locations compared. Note that these 4 metrics do not cover the full range of scores possible; it would be undesirable to list every score possible, as this should be kept extensible. instead, for non-standard scores, use the analysisprop table.


Table: analysisprop

analysisprop Structure
F-Key Name Type Description
analysisprop_id serial PRIMARY KEY

analysis

analysis_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL
value text UNIQUE#1