Difference between revisions of "GSoC"

From GMOD
Jump to: navigation, search
(Project Ideas)
 
(58 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Google Summer of Code 2015 @ Genome Informatics ==
+
[[File:GoogleSummer_2016logo.jpg|373px|right|link=GSoC]]
  
'''[http://code.google.com/soc/ Google Summer of Code]''' is a global program that offers student developers stipends to write code for various open source software projects. We work with many open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 8,500 successful student participants from 101 countries and over 8,300 mentors from over 109 countries worldwide to produce over 50 million lines of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. (''Excerpt from the [http://www.google-melange.com Google Summer of Code website]'')
+
== Google Summer of Code 2024 @ Open Genome Informatics ==
  
Since 2011, the Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including [http://gmod.org GMOD] and its software projects -- GBrowse, JBrowse, etc.; [http://galaxy.psu.edu Galaxy]; [http://porteco.org PortEco]; [http://www.reactome.org Reactome]; [http://seqware.github.io SeqWare]; [http://www.wormbase.org WormBase]; and others. More information about this year's participating bioinformatics groups can be found here [[GSOC_Groups | here]].
+
'''[https://summerofcode.withgoogle.com/ Google Summer of Code]''' is a global program that offers students, developers, and other contributors stipends to write code for various open-source software projects. We work with many open-source, free software, and technology-related groups to identify and fund projects over a 12+ week period. Since its inception seventeen years ago, the program has brought together over 20,000 successful participants from 116 countries, 850+ open-source organizations, and has generated over 45 million lines of code. Through Google Summer of Code, accepted applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. The program is open to students and to beginners in open source software development. The length of the projects is 90, 175 and 350-hours, and there is an option to extend the program from the standard 12 weeks up to 22 weeks. (''Excerpt from the [https://summerofcode.withgoogle.com/ Google Summer of Code website]'')
  
To learn more about this year's event and how GSoC works, please refer to the [http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2015/help_page#7._How_does_the_program_work GSoC FAQ].
+
Since 2011, the Open Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including [[Main Page|GMOD]] and its software projects -- [[JBrowse]], [[Apollo]], [[Chado]], [[Galaxy]] etc.; [http://www.informatics.jax.org/ Mouse Genome Informatics]; [https://oicr.on.ca/research-portfolio/ OICR]; [http://www.reactome.org Reactome]; [http://www.wormbase.org WormBase]; and [https://bioconda.github.io/ Bioconda].
  
 +
'''More information about this year's participating bioinformatics groups can be found [[GSOC_Groups | here]].'''
 +
 +
To learn more about this year's event and how GSoC works, please refer to the [https://developers.google.com/open-source/gsoc/faq FAQ].
  
 
==Mailing lists, IRC, and other ways to get in touch  ==
 
==Mailing lists, IRC, and other ways to get in touch  ==
*Email: [mailto:robin.haw@oicr.on.ca robin.haw@oicr.on.ca] '''and''' [mailto:help@gmod.org help@gmod.org] -- find out more about GSoC, a specific project, or your potential mentor(s).
+
 
 +
*Email: [mailto:rhaw@oicr.on.ca rhaw@oicr.on.ca] '''and''' [mailto:help@gmod.org help@gmod.org] -- find out more about GSoC, a specific project, or your potential mentor(s).
 
*Discussion mailing lists: [http://groups.google.com/group/genome-informatics Genome Informatics Google Groups] - ask about our projects; join the community!
 
*Discussion mailing lists: [http://groups.google.com/group/genome-informatics Genome Informatics Google Groups] - ask about our projects; join the community!
*IRC channel: #genomeinformatics on Freenode.
+
* Students and Mentors can email both [[User:Robin.haw|Robin]] and [[User:Scott|Scott]] to get more information about the program.
* Mentors can email both Robin and Scott to get more information about the program and get signed up.
+
  
 +
== [[GSOC_Project_Ideas_2024 | Project Ideas]] ==
  
== [[GSOC_Project_Ideas_2015 | Project Ideas]] ==
+
'''Got an idea for a GSOC project? [[GSOC_Project_Ideas_2022 |Add it here]].''' Ideas will be included in the proposal we send to GSOC, and great ideas make for a great proposal, so please add yours now.
There are plenty of challenging and interesting '''[[GSOC_Project_Ideas_2015 | project ideas]]''' this year. These projects include a broad set of skills, technologies and domains, such as GUIs, database integration and algorithms. Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! '''Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route.''' As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.
+
 +
These projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration, and algorithms. Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! <br>
 +
'''Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route.''' As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open-source programmers.
  
Project Idea 1: Using an interpreted language to develop bioinformatics workflows
+
== Preparing for GSoC 2024 ==
Brief explanation: SeqWare is a bioinformatics workflow engine that can be used to chain together the analysis of big data in genomics and bioinformatics. The current workflow language is Java, which is rather verbose.  
+
Right now it is the organization application process for GSoC - we won't know if Open Genome Informatics has been accepted as a GSOC 2024 mentoring organization until [https://developers.google.com/open-source/gsoc/timeline February 22nd]. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.
  
Expected results: Use Groovy to hide the current rather verbose Java workflow language. Using an interpreted language also enables for rapid prototyping of workflows. The goal of this is to make scripting SeqWare feel more like shell scripting. This is a similar effort to the GATK team’s Queue, but this would leverage SeqWare. Prototype: https://github.com/larsgt/stimpy
+
===Contributors===
 
+
Knowledge prerequisites: Java, Groovy, git
+
Skill level: Medium
+
Mentors: Lars Jorgensen, Morgan Taschuk, Pipeline team
+
 
+
 
+
Project Idea 2: Write a Foreign Data Wrapper for Postgres and BAM/VCF
+
Knowledge prerequisites: C+Postgres
+
Brief explanation: SQL is a powerful language that makes querying structured data very straightforward, and genomics produces several types of structured data. Big data from genomics usually comes in two parts: the results, stored in files, and the metadata that describe the results, usually stored in databases. For example, VCF files describe a variant in particular cancer-causing gene, and the metadata will describe what the sample was, where it came from, how it was processed, etc. We would like to use SQL to query both results and metadata together.
+
 
+
Expected results: Develop a Foreign Data Wrapper for BAM and VCF in order to query alignment and variant information. There is an existing Foreign Data Wrapper for TSV files. This should make VCF and SAM fairly straight forward. Accessing BAM files would be slightly more involved. This could provide a good example of making queries against BAM data.
+
 
+
http://www.postgresql.org/docs/9.1/static/fdwhandler.html
+
http://www.depesz.com/2011/03/14/waiting-for-9-1-foreign-data-wrapper/
+
 
+
Knowledge prerequisites: PostgreSQL
+
Skill level: advanced
+
Mentors: Lars Jorgensen
+
 
+
 
+
Project Idea 3: Implement a FUSE interface to BAM/CRAM
+
Brief explanation: Storage of big data is an ongoing problem that will only get worse. As data moves through a processing pipeline in genomics, the output data is often a lossless conversion of data integrating different information (e.g. FASTQ is a listing of all reads; BAM is an alignment of those reads to a reference but still contains all of the reads from the FASTQ). However, data from earlier in the pipeline is often kept so that the analysis can be repeated with different tools. This results in a duplication of data on the order of gigabytes to terabytes.
+
 
+
Expected results: Enable a tool to see the same BAM file as either two FASTQs, interleaved FASTQ or whatever format it needs (with the same information). This should be easy to prototype using Python as fuse-python and pysam exists.
+
 
+
Knowledge prerequisites: C and/or Python, POSIX APIs
+
Skill level: advanced
+
Mentor: Lars Jorgensen
+
 
+
 
+
Project Idea 4: Use Galaxy to run SeqWare workflows and process on data
+
Brief explanation: SeqWare is a bioinformatics workflow engine that can be used to chain together the analysis of big data in genomics and bioinformatics. SeqWare is currently driven on the command line by skilled users. However, it would be incredibly useful to leverage SeqWare’s robustness and stability for individual non-expert users. Galaxy is a user-friendly mechanism for analysing data that can be used for this task.
+
 
+
Expected results: There are two potential sub-projects. 1) Adding SeqWare metadata and files as a data source in Galaxy, to enable Galaxy users to use SeqWare data, and 2) Launching and monitoring SeqWare workflows with Galaxy.
+
 
+
Knowledge prerequisites: Galaxy, Java, web services, PostgreSQL
+
Skill level: Medium
+
Mentor: Morgan Taschuk
+
 
+
 
+
Project Idea 5: Barcode scanner using phone or tablet to drive LIMS
+
Brief explanation: In a typical genomics lab, the Laboratory Information Management System (LIMS) is required to keep track of lot of people, equipment and samples as they interact. A typical LIMS requires a desktop computer and a lot of drop down menus in order to fulfill this task, which takes the technician away from the bench and introduces the potential for error. Large sequencing labs use barcodes instead. Barcode readers are prohibitively expensive for smaller labs.
+
 
+
Cameras on phones are getting quite good, so it should be fairly easy to drive the barcode reading from a mobile device. This would be a low cost way for smaller labs to use barcoding in the lab workflows. Barcode reading library: https://github.com/zxing/zxing
+
 
+
Expected results: A mobile LIMS application that stores a particular lab workflow and prompts the user to scan barcodes when they reach a particular step in the workflow. It would also be able to send information back to the central LIMS servers.
+
 
+
Knowledge prerequisites: iOS or Android development, web services, interface design
+
Mentor: Lars Jorgensen, Timothy Beck and Tony DeBat
+
 
+
 
+
Project Idea 6: iPython notebook on top of our infrastructure
+
Brief explanation: iPython notebook is a powerful tool. It enables reproducible science as people can share their work. It would be interesting to see how iPython notebook and SeqWare could interact. It would also be useful for OICR’s users if they could query our and other metadata using Python or R.
+
 
+
Expected result: A python library that can be used to query SeqWare’s metadata through their RESTful web service.
+
 
+
Knowledge prerequisites: Python, web services
+
Skill level: basic
+
Mentor: Timothy Beck, Lawrence Heisler, Yogi Sundaravadanam
+
 
+
== Preparing for GSoC 2015 ==
+
Right now it is off-season for GSoC - we won't know if Genome Informatics has been accepted as a GSOC 2015 mentoring organization until March 2nd. The timeline for GSoC for 2015 has now been posted [https://www.google-melange.com/gsoc/events/google/gsoc2015 here]. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.
+
 
+
===Students===
+
 
More information about [[GSOC_Applications_Guide | writing your application]] will be available closer to the start of the student application period.
 
More information about [[GSOC_Applications_Guide | writing your application]] will be available closer to the start of the student application period.
  
 
===Mentors===
 
===Mentors===
We encourage mentors and mentoring organizations to think about new projects year round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to [[GSOC_Mentoring_Guide | advice about mentoring and other resources]] are available.
+
We encourage mentors and mentoring organizations to think about new projects year-round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to [[GSOC_Mentoring_Guide | advice about mentoring and other resources]] are available.
  
 +
===Source Code===
 +
Genome and bioinformatics projects that participate in the Open Genome Informatics group maintain there own [[Source Code Repositories | source code repositories]].
 +
 +
[[Category:Galaxy]]
 
[[Category:JBrowse]]
 
[[Category:JBrowse]]
[[Category:WormBase]]
+
[[Category:MGI]]
[[Category:GSoC]]
+
[[Category:Galaxy]]
+
 
[[Category:WormBase]]
 
[[Category:WormBase]]
 
[[Category:GSoC]]
 
[[Category:GSoC]]
 
[[Category:Reactome]]
 
[[Category:Reactome]]
 +
[[Category:WebApollo]]

Latest revision as of 16:07, 12 February 2024

GoogleSummer 2016logo.jpg

Google Summer of Code 2024 @ Open Genome Informatics

Google Summer of Code is a global program that offers students, developers, and other contributors stipends to write code for various open-source software projects. We work with many open-source, free software, and technology-related groups to identify and fund projects over a 12+ week period. Since its inception seventeen years ago, the program has brought together over 20,000 successful participants from 116 countries, 850+ open-source organizations, and has generated over 45 million lines of code. Through Google Summer of Code, accepted applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. The program is open to students and to beginners in open source software development. The length of the projects is 90, 175 and 350-hours, and there is an option to extend the program from the standard 12 weeks up to 22 weeks. (Excerpt from the Google Summer of Code website)

Since 2011, the Open Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including GMOD and its software projects -- JBrowse, Apollo, Chado, Galaxy etc.; Mouse Genome Informatics; OICR; Reactome; WormBase; and Bioconda.

More information about this year's participating bioinformatics groups can be found here.

To learn more about this year's event and how GSoC works, please refer to the FAQ.

Mailing lists, IRC, and other ways to get in touch

Project Ideas

Got an idea for a GSOC project? Add it here. Ideas will be included in the proposal we send to GSOC, and great ideas make for a great proposal, so please add yours now.

These projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration, and algorithms. Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply!
Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open-source programmers.

Preparing for GSoC 2024

Right now it is the organization application process for GSoC - we won't know if Open Genome Informatics has been accepted as a GSOC 2024 mentoring organization until February 22nd. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.

Contributors

More information about writing your application will be available closer to the start of the student application period.

Mentors

We encourage mentors and mentoring organizations to think about new projects year-round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to advice about mentoring and other resources are available.

Source Code

Genome and bioinformatics projects that participate in the Open Genome Informatics group maintain there own source code repositories.