GSOC Project Ideas 2021

Revision as of 18:18, 26 February 2021 by Robin.haw (Talk | contribs)

Jump to: navigation, search

Got an idea for GSOC 2021?

Then please post it. You can either

  1. Add it here, by directly editing this page. Just copy, paste and update the template below. This requires that you have or create a login.

Projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open-source programmers.

Proposed project ideas for 2021

  • JBrowse 2 Plugins for Additional Synteny Formats
    • Brief explanation: Write a new JBrowse 2 plugin to support MSPCrunch or Mummer data input
    • Expected results: a new JBrowse 2 plugin that adds support for one of the data formats listed above
    • Project Home Page URL:
    • Knowledge prerequisites: JavaScript
    • Skill level: Medium
    • Mentors: JBrowse development team
  • Interactive viewer for systems-biology variant interpretation (UI)
    • Brief explanation: Develop a interactive DAG linking user-specified variant to genes, cell-type expression, disease association/known cancer mutations, known drug targets.
    • Expected results: Website powered by Cytoscape.js which shows input variants as nodes, linked to different levels of system organization.
    • Project Home Page URL:
    • Knowledge prerequisites: JavaScript
    • Skill level: Medium
    • Mentors: Shraddha Pai
  • Interactive viewer for systems-biology variant interpretation (Server-side)
    • Brief explanation: Create server-side database and application for system-level annotation of variants/gene, to connect to interactive UI (e.g. selected single-cell marker datasets , known disease associations, drug targets).
    • Expected results: Website allows users to visualize systems-level variant/gene annotation with interactive linkouts to data sources
    • Project Home Page URL:
    • Knowledge prerequisites: Experience with document oriented databases (e.g. MongoDB), graphQL
    • Skill level: Medium
    • Mentors: Shraddha Pai
  • Style Guides for Biological Information Portal (WormBase / Alliance of Genome Resources)
    • Brief explanation: The Alliance of Genome Resources is founded to unify access to research knowledge across different model organism systems (such as worms, flies, mouse, etc). It provides ways for published research knowledge to be categorized, aggregated and searched. The Alliance is founded by members that each specialize in a specific model organism system. They each have their own existing websites and user base (more detail on these members and their sites here: At the Alliance, we look to support the existing uses cases of the member sites while furthering usability and consistency. To achieve those goals, we need style guides that can be applied to the development of the Alliance website.
    • Expected results: Design Prototypes and guidelines resulting from several iterations of design lifecycle.
    • Project Home Page URL:
    • Project paper reference and URL: The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases
    • Knowledge prerequisites: Design or HCI. Knowledge of biology is preferred.
    • Skill level: Advanced.
    • Mentors: Sibyl Gao (
  • Bioinformatics with Jupyter Notebooks (WormBase)
    • Brief explanation: WormBase is an informational portal for curated biological research knowledge. In additional to the website, we offer programmatic access to the data through REST API and downloadable files. These programatic access is intended to support bioinformatics work. We believe working examples would augment the existing documentations, making it easier for bioinformatician to access WormBase data programmatically.
    • Expected results: A series of Jupyter Notebooks that demonstrates how WormBase data can be used in bioinformatics.
    • Project Home Page URL:
    • Project paper reference and URL: WormBase: a modern Model Organism Information Resource
    • Knowledge prerequisites: Python or R, and bioinformatics knowledge.
    • Skill level: Advanced.
    • Mentors: Sibyl Gao (

Use Galaxy to run Reactome analysis and processes on proteomics data (Reactome)

    • Brief explanation: Reactome is a free, open-source, curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Galaxy is an open, web-based platform for data-intensive biomedical research, which allows users to perform, reproduce, and share complete analyses.
    • Expected results: There are two potential sub-projects. 1) Adding Reactome as a data resource in Galaxy, to enable Galaxy users to use Reactome reaction and pathway annotation data, and 2) Performing identifier mapping and over-representation analysis workflows from Reactome in Galaxy. Reactome Github.
    • Project Home Page URL: if there is one.
    • Project paper reference and URL:,, ProteoRE (Proteomics Research Environment)
    • Knowledge prerequisites: Galaxy, Java, web services.
    • Skill level: Medium.
    • Mentors: Robin Haw (robin.haw[AT] and Joel Weiser (joel.weiser[AT]

GraphDB API (Reactome)

  • Brief explanation: Reactome uses both a relational database (MySQL) and a graph database (Neo4j). There is an existing API that uses the relational database, and many Reactome components use this API. To make it easier to transition these components to using the graph database, a new API with equivalent functionality needs to be created.
  • Expected results: A new Java API that interacts with the graph database, with functionality such that it could be used as a drop-in replacement for the relational database API.
  • Project Home Page URL:
  • Project paper reference and URL:
  • Knowledge prerequisites: Java, MySQL. Neo4j would be good, but not necessary.
  • Skill level: Advanced.
  • Mentors: Solomon Shorser (solomon.shorser[AT]

Centralized dashboard or metrics system (Reactome)

  • Brief explanation: Reactome has both manual and automated statistical tracking of its quarterly release data. This project would seek to fully automate and consolidate the quantification of release data measurement for metrics such as the number of pathways, reactions, distinct proteins (with and without UniProt isoforms), complexes, small molecules, drugs/therapeutics, literature references, etc. for human (curated) and non-human (electronically inferred) species and stratified for normal and disease biology. a centralized dashboard would be useful by the team for discussing metrics externally and community outreach.
  • Expected results: A program which will produce a standardized report of statistics for a Reactome release database with aesthetic visuals
  • Project Home Page URL:
  • Knowledge prerequisites: Java, MySQL and/or Neo4j, creating visuals for statistical data (preferred but not required)
  • Skill level: Medium.
  • Mentors: Robin Haw (robin.haw[AT] Joel Weiser (joel.weiser[AT]

Community access portal to Reactome Archive (Reactome)

  • Brief explanation: Reactome generates new pathway and other annotation data on a quarterly basis. With each new release, the preceding data set is archived to an AWS S3 bucket. As part of our data sharing policy, we would like to develop web interface to allow users to request specific versions of archived data and to make it available to download.
  • Expected results: Web interface for users to request data and download via a shareable link that either expires within a certain timeframe or after data is downloaded.
  • Project Home Page URL:
  • Knowledge prerequisites: Java, AWS, Joomla
  • Skill level: Medium.
  • Mentors: Robin Haw (robin.haw[AT] Solomon Shorser (solomon.shorser[AT]


  • Project Idea Name (Project Name/Lab Name)
    • Brief explanation: Brief description of the idea, including any relevant links, etc.
    • Expected results: describe the outcome of the project idea.
    • Project Home Page URL: if there is one.
    • Project paper reference and URL: Is there a paper about the project this effort will be a part of?
    • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
    • Skill level: Basic, Medium or Advanced.
    • Mentors: name + contact details of the lead mentor, name + contact details of 1 or 2 backup mentors.