Difference between revisions of "GSoC/IDEA 10"

From GMOD
Jump to: navigation, search
(update)
Line 1: Line 1:
 
<small><-- ''Back to [[GSoC]]''</small>
 
<small><-- ''Back to [[GSoC]]''</small>
  
Here is a space for more details and project coordination.
+
Here is a space for more details and project coordination. Please use it.
  
  
Line 37: Line 37:
 
studies' (GWAS). If you like I can put together some reading on this
 
studies' (GWAS). If you like I can put together some reading on this
 
topic for you, as it's something I'd like to understand better too!
 
topic for you, as it's something I'd like to understand better too!
 +
 +
 +
== Email 03 ==
 +
BTW, does the idea make sense or should I try to clarify it?
 +
Personally I'd like one database of 'risk factors' including genetic
 +
and environmental factors ... that way I can see how much my risk of
 +
'drinking alcohol increases risk of cancer' outweighs my 'low risk of
 +
colon cancer'. I figure one way to motivate people to contribute to
 +
such a DB is to score them personally and rank the 'best' ;-)
 +
 +
 +
== Email 04 ==
 +
<PRE>
 +
> I would like to know a bit more about what you are looking for(in
 +
> IDEA#10: The genome game: crowd-sourcing better crops), specifically:
 +
>  1) Wether you require ability to import/export databases
 +
 +
I think there will be an initial import of data from resources such as
 +
23andMe, dbSNP, 1k and 10k genomes projects, PGP, SNPedia, 'GET
 +
Evidence' and dbGap. Probably others can be considered.
 +
 +
I can't think of a good reason not to allow export of our data.
 +
 +
 +
>  2) Certain variations tend to occur together (even if not affecting
 +
> the same characteristic); so should the rank of one influence the rank
 +
> of its      friends
 +
 +
Yes. This is the basis of most SNP associations. I.e. the genome chip
 +
interrogates a 'tag' SNP that is correlated with a particular
 +
'characteristic causing' mutation (SNP, CNV, InDel, etc.) Information
 +
about these 'haploblocks' can be obtained from the 1k genomes project,
 +
but it's not crucial TBH. The fact that SNP x is associated on average
 +
with increased propensity y is enough information for our purposes.
 +
 +
 +
>  3)Does the database have to be seeded initially with information or
 +
> is just developing the interface enough (as I don't think it would be
 +
> possible for me to furnish data to seed the database in a 3 month time-
 +
> frame)
 +
 +
This is where the 'game mechanics' need to be considered. No one is
 +
going to contribute their genome to an empty database, so to get
 +
people to contribute, we need to seed the system with some well
 +
studied, predictive and interesting associations. I.e. breast cancer,
 +
diabetes, obesity, smoking, exercise, diet... Once we get a good set,
 +
it becomes interesting for people to contribute their data to see how
 +
they rank in the system. Hopefully, (the idea is that...) people will
 +
then start to add associations from studies that are relevant to them,
 +
trying to improve their score or trying to create the 'best'
 +
individual possible within the system.
 +
 +
It's harder if we focus on crops, because a) there are less resources
 +
available, b) phenotype data is harder to come by, and c) fewer people
 +
are interested in crops.
 +
 +
 +
>  4)What kind of access controls to the system are you looking for?
 +
 +
Good question... I was thinking that we would just run it like
 +
Wikipedia... a few carefully chosen admins who can kick people around
 +
a bit, but mostly let anyone do anything... The dream would be some
 +
form of computational argumentation augmented consensus, which isn't
 +
unimaginable, but is optimistic.
 +
 +
I thought about a 'contributor score' as well as a 'genome' and
 +
'lifestyle' score (nudging people towards logging in), but I'm not
 +
sure, it's marginal.
 +
 +
<snip>
 +
 +
I have to say that (unfortunately) I won't be able to promise you much
 +
time as a mentor on this project. If you like the idea, I strongly
 +
suggest you try to find additional mentors (I'm really sorry about
 +
that).
 +
</PRE>

Revision as of 23:55, 22 March 2011

<-- Back to GSoC

Here is a space for more details and project coordination. Please use it.


Email 01

As I see it there are two main strands, a) create a data model to for the "environment/trait/SNP/propensity/citation/opinion" tuples, and b) put a user-friendly wiki-style interface on top.

Sadly, I really can't commit that much mentoring time to the project (sorry, but I'm just being realistic). So the main requirement would be for you to be able to work independently with only perhaps a few hours of high level guidance per week from me. However, you're welcome to work on this project with anyone.

If you are interested, you could try playing with Semantic MediaWiki and Semantic Forms, which I see as two useful tools for this project: http://scratchpad.referata.com


Email 02

You're absolutely right, genes (or specifically, the proteins they encode) are the functional workhorses of biology, doing the 'work' of the cell and the body. The biological differences between individuals are often attributable to changes in proteins or their regulation (such as changes in their expression level). These differences are evident at the DNA level, as DNA encodes the proteins and their regulatory elements. Now the interesting thing is that specific SNPs often serve as markers for particular blocks of DNA (called haplotypes) - when 'shuffling the deck' of human DNA, their are fewer cards than you imagine, because DNA is only mixed in these large blocks. So... by measuring SNPs, you can predict a lot of biology by association.

This principle underlies the science of 'genome wide association studies' (GWAS). If you like I can put together some reading on this topic for you, as it's something I'd like to understand better too!


Email 03

BTW, does the idea make sense or should I try to clarify it? Personally I'd like one database of 'risk factors' including genetic and environmental factors ... that way I can see how much my risk of 'drinking alcohol increases risk of cancer' outweighs my 'low risk of colon cancer'. I figure one way to motivate people to contribute to such a DB is to score them personally and rank the 'best' ;-)


Email 04

> I would like to know a bit more about what you are looking for(in
> IDEA#10: The genome game: crowd-sourcing better crops), specifically:
>  1) Wether you require ability to import/export databases

I think there will be an initial import of data from resources such as
23andMe, dbSNP, 1k and 10k genomes projects, PGP, SNPedia, 'GET
Evidence' and dbGap. Probably others can be considered.

I can't think of a good reason not to allow export of our data.


>  2) Certain variations tend to occur together (even if not affecting
> the same characteristic); so should the rank of one influence the rank
> of its      friends

Yes. This is the basis of most SNP associations. I.e. the genome chip
interrogates a 'tag' SNP that is correlated with a particular
'characteristic causing' mutation (SNP, CNV, InDel, etc.) Information
about these 'haploblocks' can be obtained from the 1k genomes project,
but it's not crucial TBH. The fact that SNP x is associated on average
with increased propensity y is enough information for our purposes.


>  3)Does the database have to be seeded initially with information or
> is just developing the interface enough (as I don't think it would be
> possible for me to furnish data to seed the database in a 3 month time-
> frame)

This is where the 'game mechanics' need to be considered. No one is
going to contribute their genome to an empty database, so to get
people to contribute, we need to seed the system with some well
studied, predictive and interesting associations. I.e. breast cancer,
diabetes, obesity, smoking, exercise, diet... Once we get a good set,
it becomes interesting for people to contribute their data to see how
they rank in the system. Hopefully, (the idea is that...) people will
then start to add associations from studies that are relevant to them,
trying to improve their score or trying to create the 'best'
individual possible within the system.

It's harder if we focus on crops, because a) there are less resources
available, b) phenotype data is harder to come by, and c) fewer people
are interested in crops.


>  4)What kind of access controls to the system are you looking for?

Good question... I was thinking that we would just run it like
Wikipedia... a few carefully chosen admins who can kick people around
a bit, but mostly let anyone do anything... The dream would be some
form of computational argumentation augmented consensus, which isn't
unimaginable, but is optimistic.

I thought about a 'contributor score' as well as a 'genome' and
'lifestyle' score (nudging people towards logging in), but I'm not
sure, it's marginal.

<snip>

I have to say that (unfortunately) I won't be able to promise you much
time as a mentor on this project. If you like the idea, I strongly
suggest you try to find additional mentors (I'm really sorry about
that).