Chado Update via GFF
There has frequently been interest in updating a Chado database using a GFF file, and I've finally gotten around to trying to implement it. My initial efforts were centered around converting GFF to Chado XML using Bio::SeqIO::chadoxml, but I was never completely satisfied with the result, and I was unable to load it with XORT or DBIx::DBStag. So, I've decided to work on the GFF3 bulk loader gmod_bulk_load_gff3.pl to have it do updates and deletes as well. Accordingly, I've identified these cases that should be addressed:
Perhaps the simplest case is when updating feature properties (for purposes of this discussion, 'feature properties' encompasses items in the featureprop, feature_cvterm and feature_dbxref tables) is desired, nevertheless, it poses some possible hang ups. For instance:
- What should happen to the properties already there? Would they be uniformly deleted (bad), marked 'not current' (only partially possible) or just left there? Currently, the feature_dbxref table has an is_current column, but featureprop and feature_cvterm do not.
- This is true of all updates and deletes: how to decide that the feature is the same? Is the Name enough? What about Name and type? Name, type and srcfeature/seq_id?
Updating feature locations
If name, type and srcfeature are the same, allow featureloc updates?
Updating complete gene models
If updating child features, what happens to the old features? Remove their featureloc entries and create completely new children? Only allow this for features of type 'gene'?
Again, if name, type and srcfeature are the same, allow the delete?
- I'd say the most useful cases for many folks would be (a) add annotations/properties to main gene features, and (b) delete then reload existing gene features (with new primary data: locations, sequence, etc). These two abilities would handle many uses for annotating new genomes: adding more dbxrefs, properties, etc. to existing gene features, and ability to update selected features by drop/replace. For the second case, if one can Delete via a GFF entry, it should be easy to also Update the complete gene model.
- For GFF input to handle these, I'd say lines like this should be able to trigger updates to an existing feature, where CRUDop is your database Create/Replace/Update/Drop operation.
RefChr Source Type (st) (en) (sc) (st) (ph) Attributes ChrX MyDB gene . . . . . ID=MyGene1;CRUDop=DROP ChrX MyDB gene . . . . . ID=MyGene2;CRUDop=UPDATE;Dbxref=SW:U1234 ChrX MyDB gene 1 2 9 - . ID=MyGene3;CRUDop=REPLACE;Dbxref=SW:U1234;..more..
Dongilbert 16:48, 30 March 2007 (EDT)