<-- Back to GSoC 2011
Here is a space for more details and project coordination. Please use it.
As I see it there are two main strands, a) create a data model to for the "environment/trait/SNP/propensity/citation/opinion" tuples, and b) put a user-friendly wiki-style interface on top.
Sadly, I really can't commit that much mentoring time to the project (sorry, but I'm just being realistic). So the main requirement would be for you to be able to work independently with only perhaps a few hours of high level guidance per week from me. However, you're welcome to work on this project with anyone.
If you are interested, you could try playing with Semantic MediaWiki and Semantic Forms, which I see as two useful tools for this project: http://scratchpad.referata.com
You're absolutely right, genes (or specifically, the proteins they encode) are the functional workhorses of biology, doing the 'work' of the cell and the body. The biological differences between individuals are often attributable to changes in proteins or their regulation (such as changes in their expression level). These differences are evident at the DNA level, as DNA encodes the proteins and their regulatory elements. Now the interesting thing is that specific SNPs often serve as markers for particular blocks of DNA (called haplotypes) - when 'shuffling the deck' of human DNA, their are fewer cards than you imagine, because DNA is only mixed in these large blocks. So... by measuring SNPs, you can predict a lot of biology by association.
This principle underlies the science of 'genome wide association studies' (GWAS). If you like I can put together some reading on this topic for you, as it's something I'd like to understand better too!
BTW, does the idea make sense or should I try to clarify it? Personally I'd like one database of 'risk factors' including genetic and environmental factors ... that way I can see how much my risk of 'drinking alcohol increases risk of cancer' outweighs my 'low risk of colon cancer'. I figure one way to motivate people to contribute to such a DB is to score them personally and rank the 'best' ;-)
> I would like to know a bit more about what you are looking for(in > IDEA#10: The genome game: crowd-sourcing better crops), specifically: > 1) Wether you require ability to import/export databases I think there will be an initial import of data from resources such as 23andMe, dbSNP, 1k and 10k genomes projects, PGP, SNPedia, 'GET Evidence' and dbGap. Probably others can be considered. I can't think of a good reason not to allow export of our data. > 2) Certain variations tend to occur together (even if not affecting > the same characteristic); so should the rank of one influence the rank > of its friends Yes. This is the basis of most SNP associations. I.e. the genome chip interrogates a 'tag' SNP that is correlated with a particular 'characteristic causing' mutation (SNP, CNV, InDel, etc.) Information about these 'haploblocks' can be obtained from the 1k genomes project, but it's not crucial TBH. The fact that SNP x is associated on average with increased propensity y is enough information for our purposes. > 3)Does the database have to be seeded initially with information or > is just developing the interface enough (as I don't think it would be > possible for me to furnish data to seed the database in a 3 month time- > frame) This is where the 'game mechanics' need to be considered. No one is going to contribute their genome to an empty database, so to get people to contribute, we need to seed the system with some well studied, predictive and interesting associations. I.e. breast cancer, diabetes, obesity, smoking, exercise, diet... Once we get a good set, it becomes interesting for people to contribute their data to see how they rank in the system. Hopefully, (the idea is that...) people will then start to add associations from studies that are relevant to them, trying to improve their score or trying to create the 'best' individual possible within the system. It's harder if we focus on crops, because a) there are less resources available, b) phenotype data is harder to come by, and c) fewer people are interested in crops. > 4)What kind of access controls to the system are you looking for? Good question... I was thinking that we would just run it like Wikipedia... a few carefully chosen admins who can kick people around a bit, but mostly let anyone do anything... The dream would be some form of computational argumentation augmented consensus, which isn't unimaginable, but is optimistic. I thought about a 'contributor score' as well as a 'genome' and 'lifestyle' score (nudging people towards logging in), but I'm not sure, it's marginal. <snip> I have to say that (unfortunately) I won't be able to promise you much time as a mentor on this project. If you like the idea, I strongly suggest you try to find additional mentors (I'm really sorry about that).
Here is a link to the 1k genome project, where the data files can also be found http://www.1000genomes.org/ (sorry "1k genomes" is prolly not the 'canonical' term for the project). Here is the PGP (10k genomes): http://www.personalgenomes.org/ Not sure if they have released data yet though :-( 23andMe will be a screen scrape job. I don't know anything about GET Evidence yet. Other sources could include, for example, http://www.genomesunzipped.org/
> 1. Does the proposal presented there means that those selected would be > trying to develop a website cum software where researchers involved in > genome studies could post their researched genomes and define the type of > environment best for that particular genome. Yup! And vice-verse, the best genome for a particular environment. > 2. And then any individual who knows something about his/her genome could > check on that site what type of environment is best for him/her. Yes. I envision people uploading their personal genomes (and lifestyle characteristics) and 'playing the game' to see where they rank in terms of genomic fitness (and environmental fitness). <snip> Cheers Vibhore, will you be able to work on this regardless of Google's decision? I think it would be cool to get a group of interested students together to work on this collaboratively.
Details: I think idea 10 is two fold: 1) Create a database system within which markers for desirable or undesirable genetic traits (discovered by GWAS or other published studies) can be objectively scored with respect to certain environmental conditions. For example, when designing a potato for high yield, disease free growth in an arid country with no frost, which combination of genetic markers scores best? The database system will answer this query using scored data about marker trait associations (from studies) and environmental 'rules' that condition those scores. 2) Gather data for such a system by 'crowd sourcing' knowledge and literature information from experts in the form of a game. The game sets the challenge of designing the best or worst individual given the current data in the system, and motivates players to contribute to the data in the form of a structured wiki (such as Semantic MediaWiki). For example, given that I'm a smoking, non-drinking, vegetarian who takes 2-3 hours of exercise a day, and given that I have a relatively low genetic risk of cancer but a higher than average risk of venous thromboembolism , how do I rank relative to the other players in the game? What lifestyle or genetic changes boost my score? Similarly, what is the consensus on the health benefits of vegetarianism, and what sources are cited there? (And how is this 'argument' and consensus computationally encoded in the system?)