Difference between revisions of "GSoC/IDEA 10"

Revision as of 23:55, 22 March 2011

<-- Back to GSoC

Here is a space for more details and project coordination. Please use it.

Email 01

As I see it there are two main strands, a) create a data model to for the "environment/trait/SNP/propensity/citation/opinion" tuples, and b) put a user-friendly wiki-style interface on top.

Sadly, I really can't commit that much mentoring time to the project (sorry, but I'm just being realistic). So the main requirement would be for you to be able to work independently with only perhaps a few hours of high level guidance per week from me. However, you're welcome to work on this project with anyone.

If you are interested, you could try playing with Semantic MediaWiki and Semantic Forms, which I see as two useful tools for this project: http://scratchpad.referata.com

Email 02

You're absolutely right, genes (or specifically, the proteins they encode) are the functional workhorses of biology, doing the 'work' of the cell and the body. The biological differences between individuals are often attributable to changes in proteins or their regulation (such as changes in their expression level). These differences are evident at the DNA level, as DNA encodes the proteins and their regulatory elements. Now the interesting thing is that specific SNPs often serve as markers for particular blocks of DNA (called haplotypes) - when 'shuffling the deck' of human DNA, their are fewer cards than you imagine, because DNA is only mixed in these large blocks. So... by measuring SNPs, you can predict a lot of biology by association.

This principle underlies the science of 'genome wide association studies' (GWAS). If you like I can put together some reading on this topic for you, as it's something I'd like to understand better too!

Email 03

BTW, does the idea make sense or should I try to clarify it? Personally I'd like one database of 'risk factors' including genetic and environmental factors ... that way I can see how much my risk of 'drinking alcohol increases risk of cancer' outweighs my 'low risk of colon cancer'. I figure one way to motivate people to contribute to such a DB is to score them personally and rank the 'best' ;-)

Email 04

> I would like to know a bit more about what you are looking for(in
> IDEA#10: The genome game: crowd-sourcing better crops), specifically:
>  1) Wether you require ability to import/export databases

I think there will be an initial import of data from resources such as
23andMe, dbSNP, 1k and 10k genomes projects, PGP, SNPedia, 'GET
Evidence' and dbGap. Probably others can be considered.

I can't think of a good reason not to allow export of our data.


>  2) Certain variations tend to occur together (even if not affecting
> the same characteristic); so should the rank of one influence the rank
> of its      friends

Yes. This is the basis of most SNP associations. I.e. the genome chip
interrogates a 'tag' SNP that is correlated with a particular
'characteristic causing' mutation (SNP, CNV, InDel, etc.) Information
about these 'haploblocks' can be obtained from the 1k genomes project,
but it's not crucial TBH. The fact that SNP x is associated on average
with increased propensity y is enough information for our purposes.


>  3)Does the database have to be seeded initially with information or
> is just developing the interface enough (as I don't think it would be
> possible for me to furnish data to seed the database in a 3 month time-
> frame)

This is where the 'game mechanics' need to be considered. No one is
going to contribute their genome to an empty database, so to get
people to contribute, we need to seed the system with some well
studied, predictive and interesting associations. I.e. breast cancer,
diabetes, obesity, smoking, exercise, diet... Once we get a good set,
it becomes interesting for people to contribute their data to see how
they rank in the system. Hopefully, (the idea is that...) people will
then start to add associations from studies that are relevant to them,
trying to improve their score or trying to create the 'best'
individual possible within the system.

It's harder if we focus on crops, because a) there are less resources
available, b) phenotype data is harder to come by, and c) fewer people
are interested in crops.


>  4)What kind of access controls to the system are you looking for?

Good question... I was thinking that we would just run it like
Wikipedia... a few carefully chosen admins who can kick people around
a bit, but mostly let anyone do anything... The dream would be some
form of computational argumentation augmented consensus, which isn't
unimaginable, but is optimistic.

I thought about a 'contributor score' as well as a 'genome' and
'lifestyle' score (nudging people towards logging in), but I'm not
sure, it's marginal.

<snip>

I have to say that (unfortunately) I won't be able to promise you much
time as a mentor on this project. If you like the idea, I strongly
suggest you try to find additional mentors (I'm really sorry about
that).

Difference between revisions of "GSoC/IDEA 10"

Revision as of 23:55, 22 March 2011

Contents

Email 01

Email 02

Email 03

Email 04

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Documentation

Community

Tools