PubFetch

From GMOD
Revision as of 18:43, 25 January 2007 by 165.124.152.78 (Talk)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Description

PubFetch is part of the [[web-based literature curation toolset and functions as the interface between the literature curation tools and the online literature databases, such as PubMed. The aim of PubFetch is to provide a generic way of searching and retrieving literature data from online literature datasources so that the downstream applications dont have to deal with the idiosyncracies of the individual literature databases. Initially PubFetch will act as the interface between PubSearch and the PubMed] and Agricola databases used by RGD and TAIR. A standard API and data format will be created to provide database queries and return results, popular existing formats and protocols will be used/supported wherever possible.

of pubfetch]

Figure 1 - Overview diagram of PubFetch showing how the PubFetch module will provide a generic literature access interface to PubMed and Agricola which could be expanded to other literature sources as desired.


Plan of Action
The codebase will be developed initially in perl by adapting exising RGD perl modules designed to retrieve data from PubMed in a standard XML format. This code will be reviewed and adapted to create the main PubFetch module and appropriate database interace modules. Figure 2 is a schematic diagram of the exising RGD literature download modules.

Existing PubMed flow.jpg Figure 2- Current RGD literature download process showing perl modules used to interact with PubMed, create XML data and load into RGD

The fundamental actions required of PubFetch are as follows:
  1. Search LitDb for articles matching certain query criteria (eg. keywords, date, author, etc). This will most likely entail passing the search critieria to PubFetch and retrieving a set of accession numbers (eg. PubMed IDs, PMIDs) for matching references.
  2. Retrieve the text information from the LitDb corresponding to a supplied accession number (eg. bring me the PubMed entry for PMID 12345)


PubFetch as a BioMOBY webservice
To provide generic access to PubFetch we intend to make the core functionality available as a webservice, following the BioMOBY service model. The two actions described above will be implemented as two classes of webservices, the first taking keywords and returning PubMed IDs (or other LitDb accession) , the second taking LitDb accessions and returning the text information in a simple, standardized XML format. We will endeavour to provide the data in existing formats (raw data from the LitDb, a BioPerl-compatible format, etc) in addition to a simple XML format that is not dependent on other codebases
Downloads
These will ultimately be from SourceForge. Perl code, use case diagrams, etc. will be available shortly.