Difference between revisions of "GSOC Project Ideas 2017"

From GMOD
Jump to: navigation, search
Line 129: Line 129:
  
 
*'''Project Idea 10: Use Galaxy to run Reactome analysis and processes on genomic data (Reactome)'''
 
*'''Project Idea 10: Use Galaxy to run Reactome analysis and processes on genomic data (Reactome)'''
**''Brief explanation:''
+
**''Brief explanation:'' Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. Galaxy is an open, web-based platform for data intensive biomedical research, which allows users to perform, reproduce, and share complete analyses.
***Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. Galaxy is an open, web-based platform for data intensive biomedical research, which allows users to perform, reproduce, and share complete analyses.
+
**''Expected results:'' There are two potential sub-projects. 1) Adding Reactome as a data resource in Galaxy, to enable Galaxy users to use Reactome reaction and pathway annotation data, and 2) Performing identifier mapping and over-representation analysis workflows from Reactome in Galaxy. Reactome Github: https://github.com/reactome/  
**''Expected results:''
+
***There are two potential sub-projects. 1) Adding Reactome as a data resource in Galaxy, to enable Galaxy users to use Reactome reaction and pathway annotation data, and 2) Performing identifier mapping and over-representation analysis workflows from Reactome in Galaxy. Reactome Github: https://github.com/reactome/  
+
 
**''Knowledge prerequisites:'' Galaxy, Java, web services
 
**''Knowledge prerequisites:'' Galaxy, Java, web services
 
**''Skill level:'' Medium
 
**''Skill level:'' Medium
Line 139: Line 137:
  
 
*'''Project Idea 11: Stand-alone Reactome server in a Docker image (Reactome)'''
 
*'''Project Idea 11: Stand-alone Reactome server in a Docker image (Reactome)'''
**''Brief explanation:''
+
**''Brief explanation:'' Reactome is a free, open-source, curated and peer reviewed pathway database. The goal of this project is to produce a Docker image that contains everything that is needed for a user to run Reactome on their own workstation. This includes the web applications, databases, scripts, and other supporting infrastructure components that make up Reactome.
***Reactome is a free, open-source, curated and peer reviewed pathway database. The goal of this project is to produce a Docker image that contains everything that is needed for a user to run Reactome on their own workstation. This includes the web applications, databases, scripts, and other supporting infrastructure components that make up Reactome.
+
**''Expected results:'' A Docker image that can be pulled from an image repository such as dockerhub or quay.io which contains the latest Reactome data and software, and can be run on any Docker-capable workstation. A process by which such docker images could be automatically built as a part of the Reactome data-release cycle would also be a goal.
**''Expected results:''
+
***A Docker image that can be pulled from an image repository such as dockerhub or quay.io which contains the latest Reactome data and software, and can be run on any Docker-capable workstation. A process by which such docker images could be automatically built as a part of the Reactome data-release cycle would also be a goal.
+
 
**''Knowledge prerequisites:'' linux, Docker, Apache web servers, Tomcat, bash
 
**''Knowledge prerequisites:'' linux, Docker, Apache web servers, Tomcat, bash
 
**''Skill level:'' Medium
 
**''Skill level:'' Medium

Revision as of 02:49, 9 February 2017

There are plenty of challenging and interesting project ideas this year. These projects include a broad set of skills, technologies, and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.

  • Project Idea Name (Project Name/Lab Name)
    • Brief explanation: Brief description of the idea, including any relevant links, etc.
    • Expected results: describe the outcome of the project idea.
    • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
    • Skill level: Basic, Medium or Advanced.
    • Mentors: name + contact details of the lead mentor, name + contact details of backup mentor.


Here is a list of the proposed project ideas for 2017:

  • 1. Project Publication Reference Tracking (Galaxy)
    • Brief explanation: Open source projects need ways to demonstrate relevance and viability to funders, users, and developers. One way to do that is to track publications that use and/or reference a project's products. This is typically done through setting up email alerts or RSS feeds from sources (Google Scholar, Web of Science, ScienceDirect, ...) This effort would create software that helps projects track publications that reference them.
    • Expected results: The software would integrate notifications from many sources into a coherent list of publications, report which ones are not yet known, and provide support for adding new ones to online reference managers such as CiteULike and Mendeley. The software would be extensible to make it easy to add support for new sources of publications and to support many online references managers. The software would be usable by any project to create and maintain publication lists.
    • Knowledge prerequisites: Python or Java experience is preferred, as those are the languages of choice of the two mentor projects.
    • Skill level: Basic
    • Mentors: Dave Clements, Galaxy Project, Johns Hopkins University, clements@galaxyproject.org, Robin Haw, Reactome, Ontario Institute for Cancer Research.


  • Project Idea 2: Reactome Diagrams WebGL (Reactome)
    • Brief explanation: Implementing WebGL support in the renderer layer of Reactome's new DiagramViewer (https://github.com/reactome-pwp/diagram) using the Parallax project (http://parallax3d.org/) or similar.
    • Expected results: Faster renderings, nicer and smoother transitions, overlay more data in any zoom level, use of textures to make pathway elements more realistic and ending up having a multi-platform WebGL support.
    • Knowledge prerequisites: Java, GWT, GIT, MAVEN, HTML5 Canvas, WebGL.
    • Skill level: Medium-Advanced.
    • Mentor: Antonio Fabregat (fabregat@ebi.ac.uk) (lead mentor), Kostas Sidiropoulos (ksidiro@ebi.ac.uk) (backup mentor)


  • Project Idea 3: iOS InterMine App (InterMine)
    • Brief explanation: InterMine already has an Android application that allow users to search for genes across most of the 29 public InterMine instances, with a well documented API (http://iodocs.apps.intermine.org/). We’d love to see this reflected in an iOS application, designed using HTML5 or native technologies. As a minimum we'd like to see the Android app features replicated whilst querying a single re-badgeable InterMine. A great stretch goal would be to query multiple mines simultaneously.
    • Knowledge prerequisites:
      • iOS app development, whether native or HTML5 based.
      • Understanding of working with REST APIs.
      • Biology knowledge an advantage, but not required.
      • Git or other version control
    • Mentors:
      • Yo yo@intermine.org
      • Josh josh@intermine.org
    • Expected results: an iOS application with functionality similar to https://play.google.com/store/apps/details?id=org.intermine.app that is ready to be submitted to the Apple store.
    • Skill level: Medium.


  • Project Idea 4: Similarity project (InterMine)
    • Brief explanation: InterMine is a large graph (matrix) of entities with relationships and this holds potentially valuable data. For instance, entities that share a large number of neighbours in a graph might be biologically similar. Entities in which many other entities pass through it might be biologically important. Precalculating this information and serving it via a web service would greatly enhance InterMine’s discovery potential.
    • Knowledge prerequisites:
      • Development experience; most languages ok
      • Math skills (matrix theory?)
      • Some database experience (ability to query using SQL)
      • Biology knowledge an advantage, but not required.
      • Git or other version control
    • Expected results: A script or program that build statistics about the relationships of objects in our database.
    • Mentors:
      • Justin justincc@intermine.org
      • Julie julie@intermine.org
      • Yo yo@intermine.org
      • Josh josh@intermine.org
    • Skill level: Medium.


  • Project Idea 5: ElasticSearch and InterMine: (InterMine)
    • Brief explanation: Open-source search engines, such as the Lucene-based Elasticsearch, are providing increasingly sophisticated search capabilities, such as graphing engines and distributed scaling. We plan to do initial beta work on using ElasticSearch in InterMine, but in the long term want to make this a first class part of the system and further explore the possibilities offered by this technology, such as enhanced search, similarity analysis and connection visualization between biological entities.
    • Knowledge prerequisites:
      • Java, Python
      • Linux
      • Git
      • Docker
      • Interest in search technology and algorithms
      • Biology knowledge an advantage, but certainly not required
    • Expected results: InterMine is fully updated to use ElasticSearch, with an exploration of the enhancements this can bring for analyzing biological data.
    • Mentors:
      • Justin justincc@intermine.org
    • Skill level: Medium.


  • Project Idea 6: Create a set of exciting bioinformatics R demos using the InterMineR package (InterMine)
    • Brief explanation: Fan of R and biology? InterMine has recently created an R package (https://github.com/intermine/interminer) to take advantage of InterMine’s biological data warehouse web services, but we could use someone who is familiar with R and biology to create and document/blog some interesting code examples based on use cases we provide, with a focus on well-explained code and thorough documentation. A stretch goal would be to extend the core InterMineR package to provide additional services.
    • Knowledge prerequisites:
      • Knowledge of the R programming language
      • Proven writing / documentation or blogging skills
      • Understanding of biology / bioinformatics a significant advantage
    • Expected results: 3-10 help articles with well documented code samples demonstrating the use of the InterMineR R package.
    • Mentors:
      • Rachel rachel@intermine.org
      • Julie julie@intermine.org
    • Skill level: Easy


  • Project Idea 7: InterMine Registry (InterMine)
    • Brief explanation: Currently there are 29 different instances of InterMine, a bioinformatics data warehouse, available on the web (see footer of http://intermine.org/). We’d like to create a registry for all public instances - essentially an API that exposes the names, URLS, datatypes, and other useful information regarding an InterMine instance. This could be served from a manually curated list of InterMines, but a stretch goal might be to also include API methods to create and administer registry entries.
    • Knowledge prerequisites:
      • Good understanding of RESTful APIs, how they work, and how to implement one in a language of your choice.
      • No biology skills needed.
      • Git or other version control
    • Expected results: A read-only API that provides basic information about existing InterMine instances around the world.
    • Mentors
      • Daniela daniela@intermine.org


  • Project Idea 8: Query Visualiser: (InterMine)
    • Brief explanation: InterMine’s biological data warehouse has an extensible XML data model designed to be heavily queryable. Similar to SQL, most queries have a combination of views and constraints. This can probably be visualised as an interactive network graph and should offer some good opportunities for creative data visualisation, and would complement InterMine’s existing Query Builder (user documentation for the query builder is here: http://flymine.readthedocs.io/en/latest/query-builder/Documentationquerybuilder.html)
    • Knowledge prerequisites:
      • Good understanding of RESTful APIs
      • Client-side dev skills (JS or a language which compiles to JS).
      • No biology skills needed, but advantageous.
      • Git or other version control
    • Expected results: An interactive web-based data visualisation tool to visualise simple InterMine queries.
    • Mentors
      • Yo yo@intermine.org
      • Josh josh@intermine.org


  • Project Idea 9: Prototype a new RESTFul API querying Neo4j database (InterMine)
    • Brief explanation: Given an instance of Neo4j loaded with testmodel data, implement a new RESTful API which receives in input a query in Path-Query XML format and returns the result using Neo4j Java API or the traversal framework
    • Knowledge prerequisites:
      • Good understanding of Java
      • Good understanding of RESTful API
      • Basic understanding of graph databases
      • Git or other version control
      • No biology skills needed
    • Expected results:
      • Verify and eventually adapt the xml model to represent the relationships and their properties in Neo4j
      • Prototype a parser which reads from a simple data source and uploads Neo4j
      • Prototype a new RESTful API which returns a query result using Neo4j Java API or the traversal framework
    • Mentors
      • Daniela daniela@intermine.org


  • Project Idea 10: Use Galaxy to run Reactome analysis and processes on genomic data (Reactome)
    • Brief explanation: Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. Galaxy is an open, web-based platform for data intensive biomedical research, which allows users to perform, reproduce, and share complete analyses.
    • Expected results: There are two potential sub-projects. 1) Adding Reactome as a data resource in Galaxy, to enable Galaxy users to use Reactome reaction and pathway annotation data, and 2) Performing identifier mapping and over-representation analysis workflows from Reactome in Galaxy. Reactome Github: https://github.com/reactome/
    • Knowledge prerequisites: Galaxy, Java, web services
    • Skill level: Medium
    • Mentor: Joel Weiser (joel.weiser@oicr.on.ca)


  • Project Idea 11: Stand-alone Reactome server in a Docker image (Reactome)
    • Brief explanation: Reactome is a free, open-source, curated and peer reviewed pathway database. The goal of this project is to produce a Docker image that contains everything that is needed for a user to run Reactome on their own workstation. This includes the web applications, databases, scripts, and other supporting infrastructure components that make up Reactome.
    • Expected results: A Docker image that can be pulled from an image repository such as dockerhub or quay.io which contains the latest Reactome data and software, and can be run on any Docker-capable workstation. A process by which such docker images could be automatically built as a part of the Reactome data-release cycle would also be a goal.
    • Knowledge prerequisites: linux, Docker, Apache web servers, Tomcat, bash
    • Skill level: Medium
    • Mentor: Solomon Shorser (solomon.shorser@oicr.on.ca)


  • Project Idea 12: Pan-Genome Module for the Genome Context Viewer (GMOD)
    • Brief explanation: With the number of sequenced and annotated genomes continuously increasing, there is a need for new algorithms and tools for comparative analyses both at the nucleotide and genic levels. The Genome Context Viewer (GCV, https://goo.gl/trvfg1) is an OSS tool that enables comparative genomics by using gene families as a unit of search and comparison. It currently uses Chado as a reference implementation for its data services and can be integrated with other GMOD components via a service layer/API. This work will create an extension module that will integrate new and existing pan-genomics algorithms into GCV while leveraging the existing UI for visualization purposes. This will help to serve communities facing the challenges of having multiple reference genomes within a single species, as well as improving GCV’s utility for clade-oriented resources.
    • Expected results: The module would implement the Approximate Frequent Subpaths algorithm (Cleary, et al, 2017) for finding candidate GCV search queries and generating context-sensitive chromosome-scale synteny blocks, that is, synteny blocks derived from a chromosome’s gene family content and the GCV search parameters. It would also implement the Frequented Regions algorithm (Cleary, et al, in review) for identifying syntenic regions in pan-genome gene family graphs. As with current GCV algorithms, these implementations would be capable of aggregating data from multiple sources, enabling analyses of data distributed across multiple data-stores. The results of these algorithms would be displayed in the GCV UI, where users can interactively explore the results, perform new searches, and interlink to other relevant tools.
    • Knowledge prerequisites: Python (Django and Spark) and JavaScript (Angular 2 and D3) experience is preferred, as those are the languages used to implement GCV.
    • Skill level: Advanced
    • Mentors: Andrew Farmer, Legume Information System, National Center for Genome Resources, adf@ncgr.org; Steven Cannon, Legume Information System, US Department of Agriculture Agricultural Research Service.