Glossary

From GMOD
Jump to: navigation, search

This glossary explains terms that

  • are specific to the GMOD project, or
  • are computing terms that are used in the GMOD project.

This glossary does not define biology terms.


AJAX

AJAX is a web user interface technology used in some GMOD Components. It is used to provide a richer user experience than was typically available during the first 10 years of the web. AJAX stands for Asynchronous Javascript and XML.

See Also:

API

API stands for Application Programming Interface. An API is a well-defined programmatic interface to some resource. That is, it is an interface meant to be used by other programs to access that resource. It is distinct and sometime complementary to a Graphical User Interface or GUI, which is a direct user interface to a resource.

BAM

BAM is a binary version of Sequence Alignment/Map (SAM) format. BAM and SAM are both part of SAMtools. BAM is compressed, binary, indexed format for Next Generation Sequencing data. GBrowse 2 has an adaptor that can read BAM data.

CPAN

CPAN is the Comprehensive Perl Archive Network, a repository of Perl modules that bring additional functionality to the Perl language.

See also

CSS

Cascading Style Sheets (CSS) are a way to control the appearance of web pages. CSS is used to separate style (colors, fonts, layout, etc.) from content (the actual information on a page), allowing styles to defined in a single place and then referred to from many pages.

See also

CVS

CVS is a source code control system that used to be used by most of GMOD. Source code control systems, also known as revision control or version control systems are used to record changes to computer files. GMOD now uses SVN.

See Also:

DAG

A directed acyclic graph (DAG) is a set of nodes and connections between the nodes where every connection has a direction, and there are no loops in the connections. That is, if you start at any node, and follow connections out of that node, you will never return to it.

See also:

DAS

See Distributed Annotation System

Database

A database can be any set of organized data that is readable by a computer. It can be anywhere from an implementation of a database schema in a particular database management system to regular files that have a defined format.

For example, the database behind the FlyBase web site contains data on drosopholids, and uses the Chado schema and the PostgreSQL database management system.

See also:

Database Management System

Database management systems (DBMSs) are software systems that can manage data. PostgreSQL, MySQL, Oracle and Sybase are all examples of DBMSs. DBMSs are containers of databases. That is, they are the systems that manage databases, which is distinct from the data that they manage.

Most DBMSs are relational, which is a particular way of representing data. All DBMSs that GMOD is concerned with are relational, so GMOD uses the termsdatabase management system and relational database management system (RDBMS) interchangeably.

See also:

Database Schema

A database schema is the design of a particular database, independent of its contents. Chado is an example of a database schema. Designs (like Chado) can be reused across multiple databases.

See also:

DBMS

See Database Management System.

DBMS-Database

The topmost hierarchal element in a DBMS's collection of data. By definition, data stored within different databases cannot be related by the DBMS, by query or otherwise.

See also:

DBMS-Schema

The layer below the topmost in a DBMS's collection of data. An organizing concept somewhat similar to that of a folder or directory. Unlike data stored within different DBMS-Databases, data stored within different schema of the same DBMS-Database can be related and otherwise mutually manipulated within the DBMS.

See also:

FASTA

FASTA is a widely used text-based data format for representing nucleic acid and peptide sequence data. FASTA entries start with a header line, followed by the sequence on the immediately following lines. The header line starts with the sequence identifier. It can also contain additional information, which is often pipe ("|") separated.

A basic example, showing "ctg123", a DNA sequence that is 338 nucleotides long:

>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagt

FASTA entries can be included at the end of GFF3 files.

See also:

Foreign Key

In a database, related tables are linked together by taking the primary key from one table and placing in the related table. The primary key then becomes a foreign key.

Gene Finder Format

A former name for GFF.

Generic Feature Format

See GFF.

General Feature Format

A former name for GFF.

GFF

GFF is a standard file format for storing genomic features in a text file. GFF stands for Generic Feature Format. GFF files are plain text, 9 column, tab-delimited files. GFF databases also exist. They use a schema custom built to represent GFF data. GFF is frequently used in GMOD for data exchange and representation of genomic data.

There are two versions of GFF supported in GMOD: GFF3 and GFF2. GFF2 is now deprecated.

See also:

  • GFF - all things GFF and GFF3

GFF2

GFF2 is a supported GFF format in GMOD, but it is now deprecated and if you have a choice you should use GFF3. Unfortunately, data is sometimes only available in GFF2 format. GFF2 has a number of shortcomings compared to GFF3.

See also:

GFF3

GFF3 is the most recent version of the GFF format. It has many advantages over the now deprecated GFF2 and should be used in favor of GFF2 whenever possible.

See also:

Git

Git is a version control system, like Subversion (SVN), that is used to track and coordinate updates to files, usually software and/or documentation. Git is a distributed version control system, in that it does not require use of a central server. However, in practice, most projects use a central server, either hosted themselves or on a public host such as GitHub.

GTF

GTF is a genomic annotation file format that is very similar to GFF2 and is sometimes referred to as GFF2.5. GTF is not a supported format in GMOD so if you have a GTF file you'll need to convert it to GFF3.

See also:

GUI

GUI is an acronym for Graphical User Interface. GUIs are interfaces to computer programs that use graphics, mice, pull down menus, check boxes, and other interactive elements. GUIs contrast with command line interfaces, where you interact with the program using only the keyboard.

Java

Java is arguably the world's most popular programming language but it is not as popular for command-line work on Unix as Perl. It's encountered in GMOD primarily as a language to construct user interfaces (e.g. Apollo).

See also:

JRE

Java programs run in a virtual machine known as a Java Runtime Environment or JRE.

JSON

JSON is an acronym for JavaScript Object Notation, a lightweight data-interchange format. It is used in GMOD in Galaxy and JBrowse.

See also:

Linux

Linux is an open source operating system that is based on he Unix operating system. Linux is the default operating system for GMOD.

See also:

Middleware

Middleware is software that connects other software components so they can talk together. You can think of it as project plumbing. Like plumbing, it is hard to do well, and people take it for granted until it does not work.

See also:

Object-Relational Mapping

Objects and relations are two different ways to represent information in computing. Objects tend to be used by programming languages such as Java, while relations are widely used in databases, particularly relational databases. Object-relational mapping (ORM) converts information from one model to the other, usually at the point of interaction between object-oriented languages, and relational databases.

See also:

Operating System

An operating system (OS) is the software that controls a computer and manages the sharing of resources on that computer. Example operating systems are Microsoft Windows and Linux.

See also:

ORM

See Object-Relational Mapping.

OS

See Operating System.

Perl

Perl is the programming language most used in the bioinformatics realm, and it is the language most used by GMOD developers. It is well-suited to text and data processing and is also characterized by an extensive open source library, so it's highly functional. Many of GMOD components use BioPerl, a bioinformatics toolkit written in Perl.

Some parts of GMOD, like GBrowse, can be extended or customized using Perl but beginners' skills in Perl is sufficient for this work.

See also:

RDBMS

See Database Management System.

Relational

Most Database Management Systems (DBMSs) are relational, which is a particular way of representing data. All DBMSs that GMOD is concerned with are relational, so GMOD uses the terms database management system and relational database management system (RDBMS) interchangeably.

See also:

Relational Database Management System

See Relational and Database Management System.

SAM

Sequence Alignment/Map format. SAM is a text format for Next Generation Sequencing data. It is a part of SAMtools. GBrowse 2 has an adaptor that can read SAM data.

SAMtools

SAMtools is a set of formats and programs for storing, manipulating, and accessing Next Generation Sequencing data.

Schema

See Database Schema

SQL

SQL is a standard query language used with relational database management systems (DBMSs). Is is used to update and retrieve data that is in a database.

SQL is generally similar for different DBMSs but varies in many details from one DBMS to another.

SVN

SVN, short for Subversion, is a source code control system that is used by most of GMOD. Source code control systems, also known as revision control or version control systems are used to record changes to computer files. GMOD converted from CVS to SVN on 2009/09/15.

GMOD's main source code repository is at SourceForge. Subversion explains how to both download and update the main GMOD repository at SourceForge.

See Also:

Unix

Unix is a group of operating systems that are descended from the original Unix operating system developed in the 1970s. This includes Solaris, HP-UX, Linux, Mac OS X, and many others.

XML

XML is an acronym for eXtensible Markup Language, a data format used primarily for sharing data. It looks similar to HTML, but has a much tighter syntax than does HTML.

See also: