GSOC Project Ideas 2020

Revision as of 20:54, 28 January 2020 by Robin.haw (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Got an idea for GSOC 2020?

Then please post it. You can either

  1. Add it here, by directly editing this page. Just copy, paste and update the template below. This requires that you have or create a login.

Projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open-source programmers.

Proposed project ideas for 2020

Be the first to add a project idea.


  • Project Idea Name (Project Name/Lab Name)
    • Brief explanation: Brief description of the idea, including any relevant links, etc.
    • Expected results: describe the outcome of the project idea.
    • Project Home Page URL: if there is one.
    • Project paper reference and URL: Is there a paper about the project this effort will be a part of?
    • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
    • Skill level: Basic, Medium or Advanced.
    • Mentors: name + contact details of the lead mentor, name + contact details of 1 or 2 backup mentors.

Automated Bioinformatics Help in Galaxy

  • Brief explanation:
    • Galaxy users often encounter errors when trying to run a bioinformatics analysis. These errors may be user or data errors (e.g. misformatted dataset) or errors due to underlying computing hardware (e.g. disk is full). Helping users and Galaxy support staff determine the kind of error they encountered would be useful because a user can likely address the first type of error, while the second type requires expert invention.
    • This project will improve Galaxy’s error system by using heuristics or machine learning to identify common types of user/data errors and make suggestions on likely causes of the error and how they might be fixed. This will benefit Galaxy users with clear and actionable error messages and support staff by reducing the amount of reported, non-system errors.
  • Expected results:
    • Create a tool for analyzing, identifying, and classifying common error messages from the extensive history of error messages from the main public Galaxy server (
      • The diversity and size of this data suggests a machine learning approach, but the specific approach taken would be decided by the student and mentor.
    • Extend Galaxy’s tool definition syntax to support defining common error classes and suggested resolutions.
    • Update Galaxy’s user interface to display potential resolutions and suggested actions based on the types of errors found in an analysis.
  • Project Home Page URL:
  • Project paper reference and URL: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update, Enis Afgan et al., Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W537–W544,
  • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
  • Skill level: Medium.
  • Mentors:

Use Galaxy to run Reactome analysis and processes on genomic data (Reactome)

    • Brief explanation: Reactome is a free, open-source, curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Galaxy is an open, web-based platform for data intensive biomedical research, which allows users to perform, reproduce, and share complete analyses.
    • Expected results: There are two potential sub-projects. 1) Adding Reactome as a data resource in Galaxy, to enable Galaxy users to use Reactome reaction and pathway annotation data, and 2) Performing identifier mapping and over-representation analysis workflows from Reactome in Galaxy. Reactome Github.
    • Project Home Page URL: if there is one.
    • Project paper reference and URL:,
    • Knowledge prerequisites: Galaxy, Java, web services.
    • Skill level: Medium.
    • Mentors: Robin Haw (robin.haw[AT] and Joel Weiser (joel.weiser[AT]

Create a software package for use in R to query Reactome’s Graph Database in Neo4J

  • Project Idea Name (Project Name/Lab Name)
    • Brief explanation: The R programming language has an existing package for connection to Neo4J databases. This project’s purpose would be to use this package as a base to create a connection for querying Reactome’s Neo4J graph database and return data structures for manipulating Reactome pathway and reaction data..
    • Expected results:Allow R end-users to be able to retrieve Reactome pathway and reaction data for analysis by both pre-written functions and custom queries. Examples of categories for such functions may include pathways and reactions which contain certain genes, proteins, Gene Ontology terms or cross-references to other external databases as well as other useful queries for Reactome end-users.
    • Project Home Page URL:
    • Knowledge prerequisites: R Programming Language, Neo4J.
    • Skill level: Medium.
    • Mentors: Joel Weiser (joel.weiser[AT] and Antonio Fabregat (fabregat[AT]