Difference between revisions of "Overview"

Revision as of 01:35, 25 October 2007

Introduction

With the amount of technical documentation available for GMOD the casual observer would be forgiven if they concluded that GMOD was a project about software. But it's not, GMOD has been created for biologists and in the real world it's used by biologists. However, the creators of GMOD are mostly not practicing biologists and the look and the feel of most GMOD documentation reflects this. What we will attempt to do is discuss GMOD from the researchers' perspective. This does not simply mean describe what the software does. If you look, for example, at a typical GBrowse page like this GBrowse view of human chromosome 7 you'll understand immediately what GBrowse is built to do, and a few more minutes of clicking and scrolling will reveal all sorts of useful ways to display and query the data. A modern biologist knows a great deal about bioinformatics functionality already. What we're more concerned with here are the practical details. Like given the data I have what database should I use? or do I even need a database? Or how hard is this going to be?

In our experience we find that most biologists want to focus on the science. They may have little knowledge of programming languages or databases, and only passing interest in the IT minutiae. They have deep knowledge of their own data, needless to say, and know how data like their own can be viewed and analyzed. What they want to know is how to create their own useful set of tools for their own data in as efficient a way as possible. And when this tool set is created they want to rest assured that their platform can be easily maintained in an environment where resources may be limited. We will attempt to address these sorts of questions.

By the way, the word we used here refers to the GMOD Help Desk. The Help Desk is a good resource for biologists who want to learn more about GMOD, for whatever reason. Feel free to email us at wg-emod@nescent.org.

What is a GMOD?

GMOD is a collection of interconnected applications and databases that biologists use as repositories and as tools. That connectivity is really the key here. Bioinformatic applications and databases are produced at a steady rate and this output is described each month in a number of different journals. There's no lack of tools, but many of these tools will be little used since the typical prospective user may not have the resources or expertise required to install the tool and connect it, in some way, to the data in hand. What is generally lacking is a concerted effort to produce tools and databases that will work together.

GMOD also describes a community. Many of the pieces of GMOD, or components, are mature software with many human-years of software development behind them. This amount of effort focussed on design, development, and testing has not occurred simply because someone wanted to code. The demand for software like this has been strong since genome sequence started to appear and many of the first genome databases used GMOD components. So GMOD describes this diverse group of software developers, scientists, and laboratories that use or improve these software components every day.

GMOD is also that specific thing that's installed on your computer. It may be the private viewer to your latest data that a student set up over the weekend. It may a terabyte-size database and suite of public Web applications developed over many years at a central laboratory. It may a database of experimental data that's accessible by script, or it may the annotation tool that you use to describe your favorite genome. Now, by describing this variety are we assuring you that whatever you want to do is possible within GMOD? No. The biologists lead and the software developers follow, not the other way around. So you may find that your predicament is not addressed, or is only partially addressed, by what's available in GMOD. You have an option here, which is to do something about this. First, contact the GMOD Help Desk or one of the main mailing lists like gmod-schema to make sure that your understanding of the available GMOD resources is correct. When you're in touch with some knowledgable person try to get a sense of what the solution might be, or its degree of difficulty. It may be that your solution may entail something simple, or it may be that a project may have to be created, complete with partnerships and grants. Be assured that the GMOD participants are very interested in seeing GMOD take off in new directions.

Is It Just for Model Organisms?

At first GMOD stood for Generic Model Organism Database, this was back in the days when there were a handful of model organisms and it appeared that obtaining the genomic sequence of an organism was a prohibitively expensive proposition, taking months or years to accomplish. Now there are hundreds of such sequences, with thousands easily conceivable. However, few of the scientists studying organisms with sequence consider their organism a model, in this early sense of the word.

This is a problem for the acronym since any organism with any kind of sequence associated with it is a good candidate as a subject for a GMOD database. So, for example, there are GMOD databases with just protein sequence in them like the S. cerevisiae Proteome Browser. There are GMOD databases with EST sequence only, such as the Cattle EST Gene Family Database. There are GMOD databases that are concerned primarily with gene expression, such as the Emiliania huxleyi Serial Analysis of Gene Expression database. We even find GMOD databases dedicated to collections of RNA sequence like the Leishmania tarentolae RNA Editing database. We have also heard of GMOD databases for things like oligonucleotides and plasmids. See GMOD Users for a list of other examples. That list of GMOD databases demonstrates that GMOD is widely used, with all sorts of organisms represented, and that these databases can hold sequences of any kind.

Some clever scientists have proposed that we just drop Model from the name and "re-brand" ourselves. GMOD thought about it and decided it may cause more problems than it solves. Instead, think of the M as standing for My or Many or Myriad.

Technologies

Most GMOD installations have a general architecture in common. There is some source of data and this is going to be called a database. However it does not have to be a relational database, it could be a file or a set of files with or without some kind of index. There's a lot of flexibility at the data level. Choosing this database and loading it will be tasks you'll give a lot of thought to.

What the user will see is one or more applications. These may be a set of Web pages or a Java application. The choice of applications is dictated by the nature of your data. Sometimes the choice of application is easy or clear for a given kind of data. For some other data types you'll have to take a careful look at a few different applications and consider whether you want to invest more resources in order to create complex data representations or whether you want to expend less effort and offer something simpler.

There will also be software mediating the flow of information between the application and the database. Typically this is going to be a Web server: the Web server receives requests from the application and translates them to database queries, then receives data, formats it, and sends it back to the application. This piece of software can be thought as performing routine or mechanical tasks. It's important but you'll install it and typically pay little attention to it.

The Components of GMOD

GMOD is made up databases, applications, and adaptor software that connects these components together. Some of the most popular packages are discussed below.

What is GBrowse?

GBrowse is short for Genome Browser, or Generic Genome Browser. GBrowse is probably GMOD's most popular component and almost all of the databases listed in GMOD Users use GBrowse. It is fairly easy to install, only basic command-line familiarity is required. Do not be misled by the simplicity of the installation though: the reason that GBrowse is popular is that is a supremely capable browser. The picture below is a partial screenshot of a GBrowse page taken from the Human chromosome 7 database at TCAG. A bit of jargon: the rows, each depicting one sort of data, are called tracks and tracks are populated by one or more images called glyphs, plus text.

As of this writing GBrowse comes with some 75 different glyphs, including pie charts, dot plots, histograms, and X-Y plots suitable for quantitative data, as well as the expected array of glyphs that describe sequences and sequence annotation. It is also highly configurable, meaning you can do quite a bit of customization of the glyphs, you can link glyphs to URLs of your choice, you can internationalize the application to display different languages, you can connect and retrieve data from any database, and more. This sort of work generally requires either modifying GBrowse's configuration files or adding your own Perl code, the language that GBrowse is written in. Any customization requiring work in Perl should be considered routine coding, not difficult, and the explanation for this is that GBrowse is built to be customized.

Relational Databases

Relational databases are today's tool of choice when faced with the problem of storing complex or multifaceted data, assuming that the data is, or can be, broken down into ever smaller bits of data. All atomized data will end up in one field, analogous to the way that data can be organized as columns in a spreadsheet. Fields describing one sort of thing are organized together into tables (but database designers do not talk about things, rather entities). For example, a relational database may have a table called gene with gene.name and gene.geneid fields and a protein table with protein.name, protein.proteinid and protein.sequence fields.

The picture above shows these two tables, and explains the term relational. The relation between the tables is the shared geneid - we add the geneid field to the protein table to indicate that the CFTR_1 protein record relates back to a specific gene in the gene table. This geneid field in protein, which originates in gene and whose values are stored in gene, is an example of a special sort of field called a foreign key - think of it as a shared field or value.

For a given collection of data, genomic sequence and annotation for example, there will be more than one way to represent the data relationally. A given relational design, essentially tables and fields, is called a schema (think of the schema as a blueprint, empty, and the schema populated with data as the database). Both Chado and BioSQL can store genomic data for example, but they do it differently. The details of how one designs relational schemas is not relevant here but one can say that the designer may think about some of these general concerns:

The degree of data abstraction, which is related to a database concept called normalization and to the flexibility of the schema
The legibility of the schema, which has to do with the ease of using it
The breadth of the schema, in terms of the data types it could contain

From the scientific perspective one can ask related questions: how flexible is a given schema? Can it handle my data now and in the future? Will using a given schema be easier or harder to use than some other schema?

This last question relates mostly to the degree of abstraction of the schema, not to the actual programming languages used. All of today's relational databases are created and loaded and queried using one language, SQL. Essentially what the programmers do is use their chosen language (Perl, Java, Python, etc.) to execute SQL and they all do this equally well.

Chado and BioSQL

So when you choose to use a relational schema it will all really come down to you and your data, not technical details. Chado is one of the relational databases that are used in GMOD, the other being BioSQL. The differences are clear. BioSQL is quite focussed, it's concerned with:

Sequence
Sequence annotation
Phylogeny
Publications

It is also a thoroughly modern schema in that it uses OBO-style ontologies such as GO, the Gene Ontology. This is a requirement nowadays given the ubiquity of ontologies and their ability to describe and organize our data.

Chado's focus is broader. Its tables are broken down into groups called modules and the modules are the following:

Audit - for database audit trails
Companalysis - for data from computational analysis
Contact - for people, groups, and organizations
Controlled Vocabulary (cv) - for controlled vocabularies and ontologies
Expression - for summaries of RNA and protein expresssion
General - for identifiers
Genetic - for genetic data and genotypes
Library - for descriptions of molecular libraries
Mage - for microarray data
Map - for maps without sequence
Natural Diversity (ND) - for multiple experiments, such as phenotyping and genotyping
Organism - for taxonomic data
Phenotype - for phenotypic data
Phylogeny - for organisms and phylogenetic trees
Publication (pub) - for publications and references
Sequence - for sequences and sequence features
Stock - for specimens and biological collections
WWW -

It is also possible to add modules to Chado. For instance in early 2007 a module called mage was added, this one addresses microarray data. Other possibilities that are being discussed are modules for ecological data and additional work for phenotypic data, extending the existing phenotype module. The real point is that Chado has been designed to allow extensibility, and one can either formally propose that Chado acquire some new functionality as a module or you can add tables to Chado in the privacy of your own server.

Chado is also ontology-aware. One could state this even more forcefully: Chado depends on ontologies. For example in Chado's Sequence module it's expected that all stored sequences are identified by one or more terms from the Sequence Ontology. A quick scan of the tables in Chado, more than 100, shows that about half of the tables contain the field, foreign key, cvterm, referring to an ontology term. The ontology used as source for a term could be one of many but people in the field tend to rely on OBO ontologies. So the ontology could be a common and general one like GO, the Gene Ontology, or something highly specific to a group of organisms like the Drosophila Anatomy ontology or the Mammalian Phenotype ontology. What these ontologies do in conjunction with Chado is give you a database that is extremely flexible, and as your ontologies expand so does the expressive capability of the system.

Now there is a cost to this flexibility and breadth: Chado is complex and one must devote a certain amount of study to it, it's unlikely that someone unfamiliar with Chado can install it and then immediately set about loading it with biological data of different sorts. Fortunately there are mailing lists you can contact, as well as the GMOD Help Desk, and number of pages on this Wiki discussing Chado (see Getting Started with Chado and the Chado Manual).

GFF Databases

In addition to relational database schemas like Chado and BioSQL you will also encounter what are called GFF databases in the GMOD world. GFF is described below, it's a compact format for describing sequence and sequence annotations. GMOD installations like the Human Chromosome 1 database described above are concerned solely with sequence and annotation and the entire contents of such a database can be represented as GFF. For small installations the entire database can be just a set of GFF text files (in fact, you can install GBrowse on your personal computer and then browse Saccharomyces and Volvox genomic sequence, reading directly from GFF files installed along with GBrowse - try it!). But when the amount of GFF gets too large to be read into memory all at once you have to store the GFF in some form that's indexed for fast retrieval. The solution is to load the GFF into Mysql or some other sort of database management system, this assures good performance even if you have very large amounts of data in GFF format. This is accomplished by using the Bio::DB::GFF or Bio::DB::SeqFeature modules of Bioperl.

What is GMODWeb?

GMODWeb is a tool that will create a Web sites on top of a Chado database. An example of a GMOD Web site created this way is ParameciumDB. The problem for many researchers is a lack of IT or bioinformatics resources and GMODWeb is part of a solution. Instead of manually creating Web pages, one by one, and writing code to connect these pages to a database GMODWeb performs these steps automatically. There are also ways to customize the resulting Web site so the pages make sense to you. You will need some IT expertise to do this, this is an expert's tool, but the job will take much less time and effort and can be undertaken by a student, for example. You can also use this tool iteratively, building and re-building your site until it looks right. You will also need the Chado database.

For example, you could use GBrowse to create the genome-centered portion of a Web site and GMODWeb to display the data that is not genome-centric. Contact the GMOD Help Desk for more information.

What is CMap?

CMap is a popular comparitive map viewer. It was initially designed for use at Gramene but was re-designed to be used for any organism or set of organisms. It can display genetic maps or physical maps and draw the relations between the two. It will also show synteny. It is written in Perl and requires an underlying RDBMS such as Mysql. If you need to display maps or syntenic relationships then you may need more than GBrowse.

And Synview? or Synbrowse? or Sybil?

Yes, there are other map viewers. The alternatives to CMap are SynBrowse, SynView, and Sybil. Sybil stores its data in Chado and accomodates quite a variety of different analyses, you should go to the Sybil Web site if you want to learn more. Both SynBrowse and SynView are part of GBrowse, they can be considered a bit simpler than Sybil and CMap. You should take a good look at the respective Web sites and determine which is most suitable for you.

What is Apollo?

Unlike GBrowse, a browser, Apollo is for both viewing and manually annotating genomes. It also differs from GBrowse in that it's a Java application so there features built in to it that make annotating a bit more efficient than through a Web page. It can connect to some of the same databases as GBrowse, like Chado, so one can imagine using Apollo as a tool for expert curation and GBrowse as a viewer on the same data set, for example.

What is Modware?

Modware is a middleware package used in GMOD, written in Perl. Middleware is software that mediates the exchange of information between applications, e.g. between Web pages and databases. If you want a serious discussion of the technical details please see the GMOD Middleware page. The purpose of introducing Modware here is to say the the GMOD developers have evaluated a number of Perl middleware packages and decided that Modware is the one that developers should use if they prefer to write in Perl. Like Bioperl, Modware may be a package that you may need to install but won't need to understand in any detail.

What is Bioperl?

Bioperl is a popular bioinformatics toolkit written in Perl. The reason we mention it here is because many of the GMOD Components use parts of it. You will not have to learn Bioperl in order to use GMOD but you may have to install it.

On the other hand Bioperl does offer some attractive ways to store genomic data, not requiring any sort of relational database. We discussed Chado and BioSQL above. These two relational schemas require the prior installation of some free, open source RDBMS like Mysql or Postgres. Now installing these pieces, schema plus RDBMS, is not necessarily difficult but if all you have is sequence and sequence annotation it turns out that you can set up a sequence or genome browser using just Bioperl and GBrowse (and Apache, your Web server). To be precise, you can use either the Bio::DB::GFF module from Bioperl or the Bio::DB::SeqFeature module. See A Simple Sequence Browser below.

And What Else is in GMOD?

A number of other software packages, listed below, classified by general function. One might be tempted to think of this as a shopping list, choosing one of each. But it may also be useful to think of what is absolutely essential first and consider these other components as add-ons. We also have to add that some of these components are only loosely coupled to some of the more core components described above. In other words, an application might use its own methods to store data and not use Chado. Or, a component may be written in Java and not Perl, so it would not be able to communicate with a Perl application. For something to be considered a GMOD component it does not, at this time, have to connect to some other component.

Community Annotation

Wiki TableEdit

Comparative Genome Visualization

Database schema

Database tools

Gene Expression Visualization

Caryoscope

GeneXplorer

Pathway Tools

Genome Annotation

Genome Visualization & Editing

Literature and Curation Tools

BioDIG

Canto

Textpresso

Molecular Pathway Visualization

Pathway Tools

Ontology Visualization

Go Graphic Viewer

Workflow Management

Middleware

Tool Integration

Sequence Alignment

Blast Graphic

Website front end for Chado DB

Tripal

Case Studies

This page or section needs to be edited. Please help by editing this page to add your revisions or additions.

What we are attempting to do here is anticipate some of the basic requirements of the scientist. The classic situation is that he or she has data of some type, or of many different types, and needs to set up both a data repository and a viewer on this data. We are assuming that the scientist is not a programmer or an IT expert but that he or she is willing to learn the necessary skills or has a student available to do the required work.

A Simple Sequence Browser

The data: sequence (genomic DNA or ESTs or proteins or cDNAs or some combination of these or…)
The goal: create a browser to query and view sequence and sequence annotations
The core software: GBrowse, Apache Web server, and Bioperl
The hardware: a server running Unix (Linux or Mac) or Windows

Figure out what the annotations should be (gene coordinates or motif matches or oligonucleotide matches or hand-made annotations or some combination of these or…)
Install core software
Create or gather the annotations (BLAST results or HMMER results or GenBank files or…)
Transform all the annotations into a format suitable for loading (GFF format)
Load GFF into the GFF database
Configure GBrowse

Possible challenge: Step 4, converting all the annotations to GFF (scripts may available to perform all the conversions, or you may have to write some of the conversion code yourselves)

Skills needed: basic command-line competence, perhaps basic Perl competence if you have to write any custom conversion code
Resources available: documentation at www.gmod.org, the GMOD Help Desk, the GMOD mailing lists

Recommendation

Highly recommended. Setting this up will give you a good sense of how the software pieces interoperate. Not only that, but GBrowse is fun and it comes with sample databases so once it's installed you have actual genome sequence to play with. You can even get GBrowse running nicely on a laptop.

A Simple Sequence Browser plus a Sequence Annotator

The data: sequence (genomic DNA or ESTs or cDNAs or some combination of these or…)
The goal: create a browser to query and view sequence and sequence annotations along with an editor to manually annotate the sequences
The core software: GBrowse, Apollo, Chado (plus relational database), Apache Web server, and Bioperl
The hardware: a server running Unix (Linux or Mac) or Windows

Figure out what the annotations should be (gene coordinates or motif matches or oligonucleotide matches or hand-made annotations or some combination of these or…)
Install core software
Create or gather the annotations (BLAST results or HMMER results or GenBank files or…)
Transform all the annotations into a format suitable for loading (GFF format)
Load GFF into the Chado database
Install and configure Gbrowse
Install and configure Apollo

A challenge: Step 2, installing core software (with more components you have a more complex system and more potential pitfalls, and Chado and its relational database is a fairly detailed install) Possible challenge: Step 4, converting all the annotations to GFF (scripts may available to perform all the conversions, or you may have to write some of the conversion code yourselves)

Skills needed: basic command-line competence, perhaps basic Perl competence if you have to write any custom conversion code. Some understanding of relational databases for the Chado installation. Basic Java competence for the Apollo installation. Resources available: documentation at www.gmod.org, the GMOD Help Desk, the GMOD mailing lists

Recommendation

If you’re a GMOD novice then install GBrowse by itself first (A Simple Sequence Browser), then consider this system.

A Browser for a Stock Collection

The data: the stock collection data in some structured form (Excel or Word or…)
The goal: create a browser to query and view your laboratory’s stock collection
The core software: Chado (and its relational database), Apache Web server, and Turnkey
The hardware: a server running Unix (Linux or Mac) or Windows

Install core software
Load stock collection data into the Chado database
Run Turnkey and create a GMODWeb Web site

Possible challenge: Step 1, installing core software (Chado and its relational database is a fairly detailed install) A challenge: Step 2, loading the stock collection data into Chado (scripts will not be available to perform this loading, you will have to create the code yourselves) Possible challenge: Step 2, loading the data. The Chado schema may not be properly configured for your data and may need to be modified. Possible challenge: Step 3, running Turnkey to automatically create your browser. Turnkey is a new tool. It has been used successfully in testing and at ParameciumDB but not all possibilities have been tested.

Skills needed: General IT expertise (Turnkey automates the creation of Web sites but it is an expert’s tool) Basic programming competence to write the custom conversion code. Resources available: documentation at www.gmod.org, the GMOD Help Desk, the GMOD mailing lists

Recommendation

Consider whether you want to explore uncharted territory or not. Could be fairly straightforward for the expert, or could be challenging.

A Browser for Microarray Data

The data: microarray data in Affymetrix format
The goal: create a browser to query and view your laboratory’s microarray
The core software: Chado, Apache Web server, and ….
The hardware: a server running Unix (Linux or Mac) or Windows

Challenge: Chado can hold the microarray data using its Mage module and applications exist to view raw microarray data (e.g. Caryoscope, GeneXplorer) but these applications don’t connect to Chado.

Resources available: documentation at www.gmod.org, the GMOD Help Desk, the GMOD mailing lists

Recommendation

Either wait for the connectors to be built to some application or form a partnership with GMOD scientists and developers to see that the connectors are built.

A Browser for Map Data

The data: map data (genetic map data or physical map data or visual map data or some combination of these)
The goal: create a browser to query and view your maps, within a species or across species
The core software: GBrowse, Apache Web server, and CMap or SynView or Sybil.
The hardware: a server running Unix (Linux or Mac) or Windows

Choose the right map software, based on your map data and resources.
Install core software.
Load map data.

Possible challenge: Step 2, the installation. This may tricky if you choose one of the more fully featured packages (CMap or Sybil). Possible challenge: Step 3, the loading. It is likely that some custom coding would be required since map data comes in all sorts of different forms.

Skills needed: Basic command-line competence. Some understanding of relational databases for CMap or Sybil. Basic programming competence to write the custom loading code. Resources available: documentation at www.gmod.org, the GMOD Help Desk, the GMOD mailing lists

Recommendation

Choose one. GMOD offers good choices here, it comes down to your data and your resources. SynView is the easiest, and it comes with GBrowse.

More Information

Resources

Hardware

This is the easy part. Any recently made desktop-style computer is going to be good enough to be used as your server initially. The assumption is that you are not setting up a server that will receive thousands of queries per day but some more modest number. Naturally you will want a computer that's reasonably well-equipped:

1 Gb RAM, or more
100 Gb hard drive, or more
1 CPU running at 2 Ghz, or more
DVD drive

Some advice painfully learnt: once you've set up something that works for you make sure to make backups of your software and database. The DVD drive on your computer is one way to facilitate this, the other computers on your network are another way.

Operating System

The intention here is not to start a debate on what rules or what stinks, rather to advise you on the choice of OS that will make your life easiest. That said, if there is any way that you can run some variety of Unix (Linux or Mac OS X) then that is what you should do. The reason for this is that these are the operating systems that most of the software is developed on and that people have the most experience with. Yes, you can run every single piece of GMOD on a Windows machine but it is going to be a bit harder. It is also the case that all the documentation is based on the Unix approach, so you'll have to do the least amount of translation.

You will also find installation, in general, easier on Unix systems. This is partially due to the existence of installer programs like yum (Linux) and fink (Mac OS X) that install complex pieces of software for you. For example, to install the Mysql relational database on Mac OS X the command is:

>fink install mysql

That's it. Unfortunately you will not be able to install every piece of GMOD using yum or fink but they will make life substantially easier. For more information on these programs see:

http://biopackages.net - yum
http://finkproject.org/ - fink

Software

GMOD is software that relies on other software in order to function. This section lists some other key open source packages that you may need.

Mysql

A popular open source database. It's generally considered to be the easiest of all relation database platforms to install (Unix or Windows). It's not always viewed as favorably by database experts as it lacks certain features that they may consider useful, but one cannot deny that Mysql has proved its utility in both the open source and commercial worlds. An excellent first database for those who have no prior experience with relational databases.

Postgres

Another popular open source database (Unix or Windows). The trade-off between Mysql and Postgres is, allowing some oversimplification, simplicity on one hand versus more features on the other hand. With either platform you will get good performance, excellent documentation, and well-supported software. Postgres will be a bit harder for the novice but it is only required if you want to install Chado.

Perl

The programming language most used in the bioinformatics realm. Also the language most used by GMOD developers. It is well-suited to text and data processing and is also characterized by an extensive open source library, so it's highly functional. Many of GMOD components use Bioperl, a bioinformatics toolkit written in Perl.

Some pieces of GMOD, like GBrowse, can be extended or customized using Perl but beginners' skills in Perl would be sufficent for this work. Just installing and using GBrowse in a conventional way does not require knowlege of Perl or Bioperl.

Java

Java is arguably the world's most popular programming language but it is not as popular for command-line work on Unix as Perl. It's encountered in GMOD primarily as a language to construct user interfaces (e.g. Apollo).

Apache, the Web Server

Anytime you want to set up a system that displays Web pages you will need a Web server. If someone else hasn't already installed this for you then you will want to use the Apache Web server (also known as the Apache HTTP Server). Free of course, and secure and fast. It also turns out to be reasonably simple to install, on Unix or Windows.

Glossary

Selected common terms that you'll encounter in GMOD.

GFF

If you get into the more technical side of GMOD, loading databases in particular, you will come across this term. It refers to a tab-delimited file format for storing sequence annotations (curiously, the acronym has different definitions, Gene Finder Format, or General Feature Format). Here is an example:

 test.fa      RepeatMasker    similarity      238     289     15.4    +       .       Target "Motif:(TA)n" 2 53

The line above describes a match to a sequence motif (TAn) on a sequence contained in the "file.fa", where the match goes from position 238 to position 289 on the "+" strand.

One encounters GFF files frequently in the GMOD world. It's used as interchange format, so a script or an application may create GFF as output and some other script or application may load this GFF into a database. Or it may the database itself. There are ways to create databases directly from GFF files, though it turns out that these work well only with smaller sets of data. See GFF for more information.

Licenses

This issue is a simple one: there are no restrictions on using any of the software described here.

Case Studies

@@ Line 115: / Line 115: @@
 So when you choose to use a relational schema it will all really come down to you and your data, not technical
-details. Chado is one of the [http://www.databasejournal.com/sqletc/article.php/1469521 relational databases] that are used in GMOD, the other being [http://biosql.org BioSQL].
+details. [[Chado]] is one of the [http://www.databasejournal.com/sqletc/article.php/1469521 relational databases] that are used in GMOD, the other being [http://biosql.org BioSQL].
 The differences are clear. [http://biosql.org BioSQL] is quite focussed, it's concerned with:

Difference between revisions of "Overview"

Revision as of 01:35, 25 October 2007

Contents

Introduction

What is a GMOD?

Is It Just for Model Organisms?

Technologies

The Components of GMOD

What is GBrowse?

Relational Databases

Chado and BioSQL

GFF Databases

What is GMODWeb?

What is CMap?

And Synview? or Synbrowse? or Sybil?

What is Apollo?

What is Modware?

What is Bioperl?

And What Else is in GMOD?

Case Studies

A Simple Sequence Browser

Recommendation

A Simple Sequence Browser plus a Sequence Annotator

Recommendation

A Browser for a Stock Collection

Recommendation

A Browser for Microarray Data

Recommendation

A Browser for Map Data

Recommendation

More Information

Resources

Hardware

Operating System

Software

Mysql

Postgres

Perl

Java

Apache, the Web Server

Glossary

GFF

Licenses

Case Studies

Navigation menu

Search