Note: This document was generated from a POD formated document checked in at sourceforge. Editing will not result in long term changes.
Integrating the CMap Assembly Editor (CMAE) with In-House Systems
$Revision: 1.1 $
This document is intended to give a clear idea of what it will take to integrate the CMap Assembly Editor (CMAE) into an organizations in-house data system.
CMAE is built upon the CMap code base. The machines that run the program will need to have the CMap Perl modules installed (at a minimum).
CMAE uses CMap configuration files and reads data from the CMap database. These can be on the local machine or on a web server that the program can access.
For information on Installing CMAE see the "Installing CMAE" section of this document.
Since, CMAE uses the CMap database, the data will have to be loaded. The "Importing Data" section discusses what kind of data is needed and how to use the CMap API to do data imports.
CMAE can make modifications to the data. In order for these to become permanent outside of the CMAE environment, plug-ins need to be written to launch external scripts to change the in-house data. For information on this topic, see the "Modifying Data" section.
Since CMAE uses much of the CMap code base, the CMap modules need to be installed on each machine using it (even if the data and config files are being served off another machine.
Download CMap from the SourceForge CVS repository
$ cvs -d:pserver:firstname.lastname@example.org:/cvsroot/gmod login $ cvs -z3 -d:pserver:email@example.com:/cvsroot/gmod co -P cmap
CMAE (and CMap) requires a number of modules to be installed prior to installation.
Running "perl Build.PL" will provide a list of missing modules, which can be downloaded from CPAN.
A bundle can be used to install most of these at once. To use this bundle, run:
$ sudo perl -MCPAN -e "install Bundle::CMap"
The GD module requires the use of the libgd library which can be found at http://www.libgd.org/ .
In addition to the CMap requirements, CMAE requires:
- Perl/Tk (http://www.perltk.org/)
Perl/Tk can be downloaded from CPAN, http://search.cpan.org/~ni-s/Tk-804.027/ .
- Tkzinc (http://www.tkzinc.org/)
Zinc can render images using openGL. It can be downloaded from http://www.tkzinc.org/tkzinc/pmwiki.php?n=Main.Download .
The install process will install CMap on the machine as well as CMAE. It will ask you about the location of various web related directories. On a linux system those should be easily answered.
The install process is simply:
$ perl Build.PL $ ./Build $ sudo ./Build install
Create the Database
If you will be serving the data from a web page, the database only needs to be created on the web server.
Create the CMap database schema by reading the schema file into the database. There are schema files provided for MySQL, Oracle, PostgreSQL, Sybase and SQLite. Each is named cmap.create.dbname (e.g. cmap.create.mysql). They are in the sql directory in the distribution.
Create the configuration files
If you will be serving the data from a web page, the config files only need to be created on the web server.
The configuration files are important to CMap (and hence CMAE). They define which database is to be used and provide information about the different types of maps, features and correspondence evidences in the database.
For more information about the configuration files, see the ADMINISTRATION.pod document in the docs/ directory.
The simplest way to import data is with a Perl script using the CMap API.
Using the API, the following data types will need to be created:
- - Species
- Each species that maps in the data set belong to must be entered into the database.
- - Map Sets
- A map set is a collection of maps. The maps are of the same type (sequence, FPC, etc) and are usually from the same analysis set. For instance, the contigs from a particular assembly run would be in a set.
- - Maps
- Maps can represent many different data types, sequence, physical, genetic, etc. Simply put, a map is any type of data that can be represented as a line with features on it.
- - Features
- Features can be placed on maps. They provide the anchor points for correspondences such as a read is one anchor for a line between read pairs. There are also other types of features that can be used to create banding patterns or heat maps.
- - Map_to_Features
- In order to place a map underneath another map, CMAE requires a link between the child map and a feature on the parent. That feature represents the exact placement of the child.
- - Correspondences
- Correspondences are links between features.
- - Attributes and External References (xrefs)
- CMap also allows for assigning attributes and external references to it's objects (features, maps, etc). These can be useful for adding descriptions or providing data for an external script to work on an object (such as location of a contig's ACE file).
After modifying data in CMAE, the user can save the changes to the CMap database. However, in order to modify the underlying data, a plug-in system has been created.
There are several hooks in the code (and more to be added) where a plug-in can be attached. For instance, there is a plug-in hook attached to the right click menu. A plug-in can then be written to add a button that gets the selected maps, figures out where their ACE files are and passes them to Consed for viewing.
A hook will be added to the "Save changes" method, so any modifications in CMAE can be appropriately handled for the underlying data.
After modifying the underlying data, the plug-in could then modify the data in the CMap database to be viewed in CMAE.
Hopefully, the barrier of entry for using CMAE isn't too great. Please let me know if you see any improvements can be made. Questions and comments can be emailed to the CMAE mailing list, firstname.lastname@example.org.
Ben Faga, email@example.com
Copyright (c) 2007 Cold Spring Harbor Laboratory