This glossary explains terms that
- are specific to the GMOD project, or
- are computing terms that are used in the GMOD project.
This glossary does not define biology terms.
API stands for Application Programming Interface. An API is a well-defined programmatic interface to some resource. That is, it is an interface meant to be used by other programs to access that resource. It is distinct and sometime complementary to a Graphical User Interface or GUI, which is a direct user interface to a resource.
BAM is a binary version of Sequence Alignment/Map (SAM) format. BAM and SAM are both part of SAMtools. BAM is compressed, binary, indexed format for Next Generation Sequencing data. GBrowse 2 has an adaptor that can read BAM data.
CPAN is the Comprehensive Perl Archive Network, a repository of Perl modules that bring additional functionality to the Perl language.
- CPAN web site.
Cascading Style Sheets (CSS) are a way to control the appearance of web pages. CSS is used to separate style (colors, fonts, layout, etc.) from content (the actual information on a page), allowing styles to defined in a single place and then referred to from many pages.
- CSS Home Page @ W3C
- MediaWiki:Common.css - Extensions to MediaWiki's default CSS that we have made on this web site.
CVS is a source code control system that used to be used by most of GMOD. Source code control systems, also known as revision control or version control systems are used to record changes to computer files. GMOD now uses SVN.
A directed acyclic graph (DAG) is a set of nodes and connections between the nodes where every connection has a direction, and there are no loops in the connections. That is, if you start at any node, and follow connections out of that node, you will never return to it.
A database can be any set of organized data that is readable by a computer. It can be anywhere from an implementation of a database schema in a particular database management system to regular files that have a defined format.
Database Management System
Database management systems (DBMSs) are software systems that can manage data. PostgreSQL, MySQL, Oracle and Sybase are all examples of DBMSs. DBMSs are containers of databases. That is, they are the systems that manage databases, which is distinct from the data that they manage.
Most DBMSs are relational, which is a particular way of representing data. All DBMSs that GMOD is concerned with are relational, so GMOD uses the termsdatabase management system and relational database management system (RDBMS) interchangeably.
A database schema is the design of a particular database, independent of its contents. Chado is an example of a database schema. Designs (like Chado) can be reused across multiple databases.
FASTA is a widely used text-based data format for representing nucleic acid and peptide sequence data. FASTA entries start with a header line, followed by the sequence on the immediately following lines. The header line starts with the sequence identifier. It can also contain additional information, which is often pipe ("|") separated.
A basic example, showing "ctg123", a DNA sequence that is 338 nucleotides long:
>ctg123 cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat cttaaatgtatttccgacgaattcgaggcctgaaaagt
FASTA entries can be included at the end of GFF3 files.
- FASTA format at Wikipedia.
In a database, related tables are linked together by taking the primary key from one table and placing in the related table. The primary key then becomes a foreign key.
Gene Finder Format
A former name for GFF.
Generic Feature Format
General Feature Format
A former name for GFF.
GFF is a standard file format for storing genomic features in a text file. GFF stands for Generic Feature Format. GFF files are plain text, 9 column, tab-delimited files. GFF databases also exist. They use a schema custom built to represent GFF data. GFF is frequently used in GMOD for data exchange and representation of genomic data.
- GFF - all things GFF and GFF3
GFF2 is a supported GFF format in GMOD, but it is now deprecated and if you have a choice you should use GFF3. Unfortunately, data is sometimes only available in GFF2 format. GFF2 has a number of shortcomings compared to GFF3.
Git is a version control system, like Subversion (SVN), that is used to track and coordinate updates to files, usually software and/or documentation. Git is a distributed version control system, in that it does not require use of a central server. However, in practice, most projects use a central server, either hosted themselves or on a public host such as GitHub.
GTF is a genomic annotation file format that is very similar to GFF2 and is sometimes referred to as GFF2.5. GTF is not a supported format in GMOD so if you have a GTF file you'll need to convert it to GFF3.
GUI is an acronym for Graphical User Interface. GUIs are interfaces to computer programs that use graphics, mice, pull down menus, check boxes, and other interactive elements. GUIs contrast with command line interfaces, where you interact with the program using only the keyboard.
Java is arguably the world's most popular programming language but it is not as popular for command-line work on Unix as Perl. It's encountered in GMOD primarily as a language to construct user interfaces (e.g. Apollo).
- Category:Java - GMOD pages tagged as related to Java.
Java programs run in a virtual machine known as a Java Runtime Environment or JRE.
Middleware is software that connects other software components so they can talk together. You can think of it as project plumbing. Like plumbing, it is hard to do well, and people take it for granted until it does not work.
- Category:Middleware - List of GMOD pages tagged as related to middleware.
Objects and relations are two different ways to represent information in computing. Objects tend to be used by programming languages such as Java, while relations are widely used in databases, particularly relational databases. Object-relational mapping (ORM) converts information from one model to the other, usually at the point of interaction between object-oriented languages, and relational databases.
See Operating System.
Perl is the programming language most used in the bioinformatics realm, and it is the language most used by GMOD developers. It is well-suited to text and data processing and is also characterized by an extensive open source library, so it's highly functional. Many of GMOD components use BioPerl, a bioinformatics toolkit written in Perl.
Some parts of GMOD, like GBrowse, can be extended or customized using Perl but beginners' skills in Perl is sufficient for this work.
- Perl Home Page
- Perl's open source library repository.
- Category:Perl - GMOD pages tagged as related to Perl.
Most Database Management Systems (DBMSs) are relational, which is a particular way of representing data. All DBMSs that GMOD is concerned with are relational, so GMOD uses the terms database management system and relational database management system (RDBMS) interchangeably.
Relational Database Management System
See Database Schema
SQL is a standard query language used with relational database management systems (DBMSs). Is is used to update and retrieve data that is in a database.
SQL is generally similar for different DBMSs but varies in many details from one DBMS to another.
SVN, short for Subversion, is a source code control system that is used by most of GMOD. Source code control systems, also known as revision control or version control systems are used to record changes to computer files. GMOD converted from CVS to SVN on 2009/09/15.
XML is an acronym for eXtensible Markup Language, a data format used primarily for sharing data. It looks similar to HTML, but has a much tighter syntax than does HTML.