Databases and GMOD

From GMOD
Revision as of 18:20, 11 December 2007 by Clements (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This introduces the broad topic of databases in GMOD. It introduces some database terminology for those that are new to databases. It also covers how databases are implemented and used in GMOD, and what database management system choices are available.


Database Terminology

The term database is mentioned in all sorts of contexts in GMOD. It is even part of the project's name. Despite its central role in GMOD, the term database is often used to mean 4 different things. This section distinguishes its various meanings and introduces more precise terms that should be (but probably aren't) used throughout GMOD.


Database Management System

Database management systems (DBMSs) are software systems that can manage data. Oracle, MySQL, PostgreSQL, and Sybase are all examples of DBMSs. DBMSs are containers of databases. That is, they are the systems that manage databases, which is distinct from the data that they manage.


Database Schema

A database schema is the design of a particular database, independent of its contents. Chado is an example of a database schema. Designs (like Chado) can be reused across multiple databases.


Database Web Site

Web sites that feature a lot of database driven content, such as FlyBase or ParameciumDB, are often refered to as databases. This is somewhat accurate as there are databases backing the web sites, but it is also misleading. These websites show information that doesn't come from their database and they also may not show everything in their databases.


Database

A database in the strictest sense of the word is an implementation of a database schema in a particular database management system that is populated with data on a particular subject. A database, in other words is a populated instance.

For example, the database behind the FlyBase web site contains data on drosopholids, and uses the Chado schema and the PostgreSQL database management system.



GMOD Database Components

There are two main GMOD components that are fundamentally about databases, and several more that help you manage databases or that use (or can use) databases to accomplish their purpose.

In decreasing order of database-centricity, GMOD's database related components are:


Chado

Chado is the modular database schema of GMOD. Chado is about organizing your data in a database so that you can manage it and can connect other GMOD components to it (either directly or via data exports). When someone speaks of the GMOD Schema they are speaking about Chado.


BioMart

BioMart is a data warehouse package tailored for biological data. It takes existing databases (for example, it take the FlyBase Chaod database), transforms them into a data warehouse and then provides a web interface for supporting arbitrary queries against the data.


Database Tools

Links to DB tools.


GMOD Components that Require a DBMS

GMOD DBMS Choices

Several GMOD Components rely on databases to store their data. All such components have a default DBMS that the developers had in mind when they created the component. The default DBMS is most often PostgreSQL or MySQL. PostgreSQL, commonly known as Postgres, and MySQL are both open-source DBMSs with large and active user communities. It is possible to use a DBMS other than the default but it does involve more work, sometimes a lot more work.

See the component descriptions to find out if they need an underlaying database and what their default DBMS is.


Can I Use Something Besides the Default DBMS?

The answer is yes, but it will mean extra work.

You may want to do this if you are already using a DBMS that you understand. DBMS administration is non-trivial and adding one or two more DBMSs to the list you have to support may or may not be more effort than porting the component to use your DBMS of choice. However, do keep in mind that one of the reasons why MySQL and Postgres are often picked as default DBMSs is that they are comparatively easy to administer.

DBMSs in Use in the GMOD Community

Postgres and MySQL are the most popular DBMSs, but several others are in use in the GMOD community.


PostgreSQL

Postgres is the default DBMS for Chado, GMOD's modular database schema.

See the PostgreSQL page from more information on Postgres.

MySQL

MySQL adapters exist for GBrowse anot other components?.


DB2

Xenbase uses DB2 for their Chado installation. DB2 is a high-end database from IBM that has a free version and also a free academic licenses. DB2 is one of the big players in the commercial database market.

Oracle

ApiDB uses Oracle for its database needs. Oracle is a high-end database from Oracle Corporation. It is the most popular commercial database in the world.


Sybase

Due to its heritage at JCVI, the default database of the Ergatis workflow management component is Sybase. Work is being done to also support the PostgreSQL and Oracle DBMSs.