Difference between revisions of "GMOD Malaysia 2014/Intermine Tutorial"

From GMOD
Jump to: navigation, search
(Templates)
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
 
== The Web Interface ==
 
== The Web Interface ==
  
Line 14: Line 13:
 
* How many hits are returned for "eve", "alzheimer"?
 
* How many hits are returned for "eve", "alzheimer"?
 
* Which fly genes match "alzheimer"?
 
* Which fly genes match "alzheimer"?
 +
 +
=== ID Resolution ===
 +
 +
A more specialised search interface is the ID resolution mechanism that lies at the heart of list uploads.
 +
This is critically important for dealing with the kinds of messy identifier lists people often have, and
 +
turning them into clean sets of items.
 +
 +
You may wish to use the following file with gene ids: [[File:Example-gene-ids.txt]]
 +
 +
Tasks:
 +
* Upload a set of gene identifiers, picking the items you want.
 +
* Change the organism parameter. What difference does that make?
 +
* Save this list of items on the server, giving it a custom name.
  
 
=== Templates ===
 
=== Templates ===
Line 22: Line 34:
 
Tasks:
 
Tasks:
 
* Find the templates section
 
* Find the templates section
* What genes are associates with "Homeo-" domains?
 
 
* How many templates are good matches for "protein domain"?
 
* How many templates are good matches for "protein domain"?
 
* Can you find the protein domains for the gene "mad"?
 
* Can you find the protein domains for the gene "mad"?
 +
* What genes are associated with "Homeo-" domains?
 
* Can you find the protein domains for genes expressed in the Fly ovary? How many are there?
 
* Can you find the protein domains for genes expressed in the Fly ovary? How many are there?
nes.
 
  
Templates are frequently used to query against lists of items - choose one from the
+
Templates are frequently used to query against lists of items - choose one of the available public lists:
list of available public lists:
+
* Which genes have a particularly large number of interactions with your set of genes? What if you are only interested in “suppression” interactions?
* Which genes have a particularly large number of interactions with your
+
* Which orthologue data set has the most orthologues for genes in your list? What if you only look at the orthologues in mosquito?
*: set of genes? What if you are only interested in “suppression”
+
*: interactions?
+
* Which orthologue data set has the most orthologues for genes in your
+
*: list? What if you only look at the orthologues in mosquito?
+
  
=== Using the Report Pages ===
+
=== Using the Results Tables ===
  
Report pages allow you to get an overview of the diverse data available for individual items.
+
The results tables allow you to start exploring once you have begun to query the data. As well as viewing the results of a query, they allow you to start using it, by adding and removing elements from it, downloading it or sending it elsewhere.
  
=== Using the Results Tables ===
+
It is suggested you perform these tasks by using the result table for the orthologues of cdc2 (see templates).
  
The results tables allow you to start exploring once you have begun to query the data.
+
Tasks:
 +
* Use the pagination controls to view the data. Which othologue is in the first row on the second page?
 +
* Use the column controls to hide irrelevant columns (such as the evidence code name).
 +
* Completely remove a column - does it change the number of rows? Undo this action.
 +
* Use the column summaries to get an overview of a column. (e.g. Which data set is most heavily represented?)
 +
* Use a column summary to add a filter. (e.g. Restrict the table to only the Panter data set)
 +
* Change the filter we added - change it to exclude "Panther" orthologues.
 +
* Add a new column (e.g. The protein information for the orthologue) How does it change the table?
 +
* Have a look at the code representation for the query (which language is the most succinct?)
 +
* Download the data as tsv.
 +
* Download just one or two columns of data.
 +
* Send the data to Galaxy - but just the othologues (extra marks for sending them as gff3)
 +
* Save the orthologues as a new list.
 +
* Add just one or two genes to one of your lists.
 +
* Download a column summary.
 +
 
 +
=== Using the Report Pages ===
 +
 
 +
Report pages allow you to get an overview of the diverse data available for individual items. Detailed report pages are typically provided for genes and proteins.
 +
 
 +
Tasks:
 +
* visit the report page for a gene (e.g. "Mad"). How many ways are there to get there?
 +
* How many sections are there?
 +
* Which diseases is it associated with?
 +
* How many publications does it appear in?
 +
* Which tissues is it up-regulated in?
 +
* Can you download the interaction network?
  
 
=== Using Lists ===
 
=== Using Lists ===
  
Lists allow you to store and re-use sets of items.
+
Lists allow you to store and re-use sets of items. Many components allow you to create lists, and they can be used
 +
in various places, including queries, analysis, and combination.
 +
 
 +
Tasks:
 +
 
 +
Creating a List:
 +
* Create a list from ID resolution ("upload").
 +
* Create a list from a result set (using a result table - e.g. Create a list of the protein domains associated with a gene (such as "Mad"))
 +
* Create a list from a region search (using a set of genomic intervals).
 +
 
 +
Using a list:
 +
* Find the genes that share the same protein domains as those in a list (hint: use a template)
 +
* Which GO terms characterise your list best?
 +
* Which genes in the second list are not in the first one? (there are two ways to do this!)
 +
 
 +
Maintaining a list:
 +
* Edit a list's properties (name, description, tags, etc).
 +
* Delete one of your lists.
 +
* Share a list with another user.
 +
 
 +
=== Using List Analysis Tools ===
 +
 
 +
List analysis tools are intended to give a useful overview of a set of items - they are generally accessed on the list analysis page.
 +
 
 +
Tasks: for a certain list (such as the putative Fly TFs)
 +
* Which GO terms/publications/protein domains are most characteristic?
 +
* How does changing the reference population make a difference?
 +
* How does changing the error correction method make a difference?
 +
* Can anything in general be said about expression?
 +
* Make a list of just those genes expressed in stage 4-5.
 +
* View the genes annotated with the "transcription GO term".
 +
 
 +
== Using the API ==
 +
 
 +
Materials: https://github.com/alexkalderimis/gmod-malaysia-2014
 +
 
 +
The REST API can be accessed by any tool capable of making HTTP calls, but to make it easier
 +
we also publish libraries for several languages. These enable the same things that can be done
 +
in the web interface to be automated.
 +
 
 +
All of the web interface tasks could be accomplished with web services, and the course materials
 +
provide some examples of how the API can be used from python and javascript.
 +
 
 +
Suggested tasks:
 +
* Edit the python scripts so they behave differently (use different values, different mines, different queries, print different information).
 +
* Generate the python code for templates and queries - run it as you would these scripts.
 +
* Complete the "proteins" stub for the js demo web-app.

Latest revision as of 00:09, 28 February 2014

The Web Interface

This section focusses on the features of the graphical web interface.

All these tasks will look at FlyMine.

Search

The most basic search interface - the "quick-search" allows unstructured access to data in the system. Information is provided about what type of data is returned, to help narrow the search.

Tasks:

  • How many hits are returned for "eve", "alzheimer"?
  • Which fly genes match "alzheimer"?

ID Resolution

A more specialised search interface is the ID resolution mechanism that lies at the heart of list uploads. This is critically important for dealing with the kinds of messy identifier lists people often have, and turning them into clean sets of items.

You may wish to use the following file with gene ids: File:Example-gene-ids.txt

Tasks:

  • Upload a set of gene identifiers, picking the items you want.
  • Change the organism parameter. What difference does that make?
  • Save this list of items on the server, giving it a custom name.

Templates

Templates provide a more targeted search, which can be more powerful once you know what you are looking for.

Tasks:

  • Find the templates section
  • How many templates are good matches for "protein domain"?
  • Can you find the protein domains for the gene "mad"?
  • What genes are associated with "Homeo-" domains?
  • Can you find the protein domains for genes expressed in the Fly ovary? How many are there?

Templates are frequently used to query against lists of items - choose one of the available public lists:

  • Which genes have a particularly large number of interactions with your set of genes? What if you are only interested in “suppression” interactions?
  • Which orthologue data set has the most orthologues for genes in your list? What if you only look at the orthologues in mosquito?

Using the Results Tables

The results tables allow you to start exploring once you have begun to query the data. As well as viewing the results of a query, they allow you to start using it, by adding and removing elements from it, downloading it or sending it elsewhere.

It is suggested you perform these tasks by using the result table for the orthologues of cdc2 (see templates).

Tasks:

  • Use the pagination controls to view the data. Which othologue is in the first row on the second page?
  • Use the column controls to hide irrelevant columns (such as the evidence code name).
  • Completely remove a column - does it change the number of rows? Undo this action.
  • Use the column summaries to get an overview of a column. (e.g. Which data set is most heavily represented?)
  • Use a column summary to add a filter. (e.g. Restrict the table to only the Panter data set)
  • Change the filter we added - change it to exclude "Panther" orthologues.
  • Add a new column (e.g. The protein information for the orthologue) How does it change the table?
  • Have a look at the code representation for the query (which language is the most succinct?)
  • Download the data as tsv.
  • Download just one or two columns of data.
  • Send the data to Galaxy - but just the othologues (extra marks for sending them as gff3)
  • Save the orthologues as a new list.
  • Add just one or two genes to one of your lists.
  • Download a column summary.

Using the Report Pages

Report pages allow you to get an overview of the diverse data available for individual items. Detailed report pages are typically provided for genes and proteins.

Tasks:

  • visit the report page for a gene (e.g. "Mad"). How many ways are there to get there?
  • How many sections are there?
  • Which diseases is it associated with?
  • How many publications does it appear in?
  • Which tissues is it up-regulated in?
  • Can you download the interaction network?

Using Lists

Lists allow you to store and re-use sets of items. Many components allow you to create lists, and they can be used in various places, including queries, analysis, and combination.

Tasks:

Creating a List:

  • Create a list from ID resolution ("upload").
  • Create a list from a result set (using a result table - e.g. Create a list of the protein domains associated with a gene (such as "Mad"))
  • Create a list from a region search (using a set of genomic intervals).

Using a list:

  • Find the genes that share the same protein domains as those in a list (hint: use a template)
  • Which GO terms characterise your list best?
  • Which genes in the second list are not in the first one? (there are two ways to do this!)

Maintaining a list:

  • Edit a list's properties (name, description, tags, etc).
  • Delete one of your lists.
  • Share a list with another user.

Using List Analysis Tools

List analysis tools are intended to give a useful overview of a set of items - they are generally accessed on the list analysis page.

Tasks: for a certain list (such as the putative Fly TFs)

  • Which GO terms/publications/protein domains are most characteristic?
  • How does changing the reference population make a difference?
  • How does changing the error correction method make a difference?
  • Can anything in general be said about expression?
  • Make a list of just those genes expressed in stage 4-5.
  • View the genes annotated with the "transcription GO term".

Using the API

Materials: https://github.com/alexkalderimis/gmod-malaysia-2014

The REST API can be accessed by any tool capable of making HTTP calls, but to make it easier we also publish libraries for several languages. These enable the same things that can be done in the web interface to be automated.

All of the web interface tasks could be accomplished with web services, and the course materials provide some examples of how the API can be used from python and javascript.

Suggested tasks:

  • Edit the python scripts so they behave differently (use different values, different mines, different queries, print different information).
  • Generate the python code for templates and queries - run it as you would these scripts.
  • Complete the "proteins" stub for the js demo web-app.