GMOD Malaysia 2014/Intermine Tutorial

From GMOD
Revision as of 23:43, 27 February 2014 by Alexkalderimis (Talk | contribs)

Jump to: navigation, search

The Web Interface

This section focusses on the features of the graphical web interface.

All these tasks will look at FlyMine.

Search

The most basic search interface - the "quick-search" allows unstructured access to data in the system. Information is provided about what type of data is returned, to help narrow the search.

Tasks:

  • How many hits are returned for "eve", "alzheimer"?
  • Which fly genes match "alzheimer"?

ID Resolution

A more specialised search interface is the ID resolution mechanism that lies at the heart of list uploads. This is critically important for dealing with the kinds of messy identifier lists people often have, and turning them into clean sets of items.

You may wish to use the following file with gene ids: File:Example-gene-ids.txt

Tasks:

  • Upload a set of gene identifiers, picking the items you want.
  • Change the organism parameter. What difference does that make?
  • Save this list of items on the server, giving it a custom name.

Templates

Templates provide a more targeted search, which can be more powerful once you know what you are looking for.

Tasks:

  • Find the templates section
  • How many templates are good matches for "protein domain"?
  • Can you find the protein domains for the gene "mad"?
  • What genes are associated with "Homeo-" domains?
  • Can you find the protein domains for genes expressed in the Fly ovary? How many are there?

Templates are frequently used to query against lists of items - choose one of the available public lists:

  • Which genes have a particularly large number of interactions with your set of genes? What if you are only interested in “suppression” interactions?
  • Which orthologue data set has the most orthologues for genes in your list? What if you only look at the orthologues in mosquito?

Using the Results Tables

The results tables allow you to start exploring once you have begun to query the data. As well as viewing the results of a query, they allow you to start using it, by adding and removing elements from it, downloading it or sending it elsewhere.

It is suggested you perform these tasks by using the result table for the orthologues of cdc2 (see templates).

Tasks:

  • Use the pagination controls to view the data. Which othologue is in the first row on the second page?
  • Use the column controls to hide irrelevant columns (such as the evidence code name).
  • Completely remove a column - does it change the number of rows? Undo this action.
  • Use the column summaries to get an overview of a column. (e.g. Which data set is most heavily represented?)
  • Use a column summary to add a filter. (e.g. Restrict the table to only the Panter data set)
  • Change the filter we added - change it to exclude "Panther" orthologues.
  • Add a new column (e.g. The protein information for the orthologue) How does it change the table?
  • Have a look at the code representation for the query (which language is the most succinct?)
  • Download the data as tsv.
  • Download just one or two columns of data.
  • Send the data to Galaxy - but just the othologues (extra marks for sending them as gff3)
  • Save the orthologues as a new list.
  • Add just one or two genes to one of your lists.
  • Download a column summary.

Using the Report Pages

Report pages allow you to get an overview of the diverse data available for individual items. Detailed report pages are typically provided for genes and proteins.

Tasks:

  • visit the report page for a gene (e.g. "Mad"). How many ways are there to get there?
  • How many sections are there?
  • Which diseases is it associated with?
  • How many publications does it appear in?
  • Which tissues is it up-regulated in?
  • Can you download the interaction network?

Using Lists

Lists allow you to store and re-use sets of items. Many components allow you to create lists, and they can be used in various places, including queries, analysis, and combination.

Tasks:

Creating a List:

  • Create a list from ID resolution ("upload").
  • Create a list from a result set (using a result table - e.g. Create a list of the protein domains associated with a gene (such as "Mad"))
  • Create a list from a region search (using a set of genomic intervals).

Using a list:

  • Find the genes that share the same protein domains as those in a list (hint: use a template)
  • Which GO terms characterise your list best?
  • Which genes in the second list are not in the first one? (there are two ways to do this!)

Maintaining a list:

  • Edit a list's properties (name, description, tags, etc).
  • Delete one of your lists.
  • Share a list with another user.

Using List Analysis Tools

List analysis tools are intended to give a useful overview of a set of items - they are generally accessed on the list analysis page.

Tasks: for a certain list (such as the putative Fly TFs)

  • Which GO terms/publications/protein domains are most characteristic?
  • How does changing the reference population make a difference?
  • How does changing the error correction method make a difference?
  • Can anything in general be said about expression?
  • Make a list of just those genes expressed in stage 4-5.
  • View the genes annotated with the "transcription GO term".