GSOC Project Ideas 2020

From GMOD
Revision as of 22:04, 3 February 2020 by Sibyl (Talk | contribs)

Jump to: navigation, search

Got an idea for GSOC 2020?

Then please post it. You can either

  1. Add it here, by directly editing this page. Just copy, paste and update the template below. This requires that you have or create a GMOD.org login.

Projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open-source programmers.


Proposed project ideas for 2020

Be the first to add a project idea.

Template

  • Project Idea Name (Project Name/Lab Name)
    • Brief explanation: Brief description of the idea, including any relevant links, etc.
    • Expected results: describe the outcome of the project idea.
    • Project Home Page URL: if there is one.
    • Project paper reference and URL: Is there a paper about the project this effort will be a part of?
    • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
    • Skill level: Basic, Medium or Advanced.
    • Mentors: name + contact details of the lead mentor, name + contact details of 1 or 2 backup mentors.


Automated Bioinformatics Help in Galaxy

  • Brief explanation:
    • Galaxy users often encounter errors when trying to run a bioinformatics analysis. These errors may be user or data errors (e.g. misformatted dataset) or errors due to underlying computing hardware (e.g. disk is full). Helping users and Galaxy support staff determine the kind of error they encountered would be useful because a user can likely address the first type of error, while the second type requires expert invention.
    • This project will improve Galaxy’s error system by using heuristics or machine learning to identify common types of user/data errors and make suggestions on likely causes of the error and how they might be fixed. This will benefit Galaxy users with clear and actionable error messages and support staff by reducing the amount of reported, non-system errors.
  • Expected results:
    • Create a tool for analyzing, identifying, and classifying common error messages from the extensive history of error messages from the main public Galaxy server (https://usegalaxy.org).
      • The diversity and size of this data suggests a machine learning approach, but the specific approach taken would be decided by the student and mentor.
    • Extend Galaxy’s tool definition syntax to support defining common error classes and suggested resolutions.
    • Update Galaxy’s user interface to display potential resolutions and suggested actions based on the types of errors found in an analysis.
  • Project Home Page URL: galaxyproject.org
  • Project paper reference and URL: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update, Enis Afgan et al., Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W537–W544, https://doi.org/10.1093/nar/gky379
  • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
  • Skill level: Medium.
  • Mentors:

Use Galaxy to run Reactome analysis and processes on genomic data (Reactome)

    • Brief explanation: Reactome is a free, open-source, curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Galaxy is an open, web-based platform for data intensive biomedical research, which allows users to perform, reproduce, and share complete analyses.
    • Expected results: There are two potential sub-projects. 1) Adding Reactome as a data resource in Galaxy, to enable Galaxy users to use Reactome reaction and pathway annotation data, and 2) Performing identifier mapping and over-representation analysis workflows from Reactome in Galaxy. Reactome Github.
    • Project Home Page URL: if there is one.
    • Project paper reference and URL: reactome.org, galaxyproject.org
    • Knowledge prerequisites: Galaxy, Java, web services.
    • Skill level: Medium.
    • Mentors: Robin Haw (robin.haw[AT]oicr.on.ca) and Joel Weiser (joel.weiser[AT]oicr.on.ca).

Create a software package for use in R to query Reactome’s Graph Database in Neo4J

  • Project Idea Name (Project Name/Lab Name)
    • Brief explanation: The R programming language has an existing package for connection to Neo4J databases. This project’s purpose would be to use this package as a base to create a connection for querying Reactome’s Neo4J graph database and return data structures for manipulating Reactome pathway and reaction data..
    • Expected results:Allow R end-users to be able to retrieve Reactome pathway and reaction data for analysis by both pre-written functions and custom queries. Examples of categories for such functions may include pathways and reactions which contain certain genes, proteins, Gene Ontology terms or cross-references to other external databases as well as other useful queries for Reactome end-users.
    • Project Home Page URL: reactome.org.
    • Knowledge prerequisites: R Programming Language, Neo4J.
    • Skill level: Medium.
    • Mentors: Joel Weiser (joel.weiser[AT]oicr.on.ca).

Community data submission (WormBase)

  • Brief explanation: WormBase is a comprehensive research knowledgebase on the biology of nematodes. Our database is built by extracting and standardizing information from published literature, which is time consuming and low throughput. Hence, we would like to encourage our users, who also derive the knowledge originally, to submit their findings through our website. This would speed up the integration of knowledge in our database and diversify our data sources.
  • Expected results: Website frontend components and backend mechanisms that allow inline data submission from users, realtime update of the website, notification and mechanism for review and integrate the data into WormBase database.
  • Project Home Page URL: https://wormbase.org
  • Project paper reference and URL: https://academic.oup.com/nar/article/48/D1/D762/5603222
  • Knowledge prerequisites: JavaScript, experience building cloud native solution preferred.
  • Skill level: Advanced
  • Mentors: Todd Harris (todd[AT]wormbase.org), Sibyl Gao (sibyl[AT]wormbase.org).

Data Table functionality and performance (WormBase)

  • Brief explanation: WormBase is a comprehensive research knowledgebase on the biology of nematodes. Biologists access our vast information through a web port which often provides information in many tables. Here is an example page of a well-studied gene in C. elegans, dat-6, https://wormbase.org/species/c_elegans/gene/WBGene00000912#01347b--10. These tables where developed years ago based on HTML and jQuery, with certain features depending on Flash. Their limitations and usability issues are more pronounced now. Hence, we are looking for a new implementation of these tables with React, which is used in many parts of the site.
  • Expected results: A generic and customizable table component in React for displaying WormBase data, with the ability to search, filter, sort, paginate, and export all or parts of the table.
  • Project Home Page URL: https://wormbase.org
  • 'Project paper reference and URL: https://academic.oup.com/nar/article/48/D1/D762/5603222
  • Knowledge prerequisites: JavaScript, CSS, React.
  • Skill level: Medium
  • Mentors: Sibyl Gao (sibyl[AT]wormbase.org).

GraphQL over Microservice Architecture (WormBase / Alliance of Genome Resources)

Single Sign On (WormBase)

Faster Autocompletion (WormBase Name Service)

Word and sentence completion for curatorial remarks (WormBase Name Service)