Data-driven medicine: understanding the link between genetics and disease | NaveeNBioinforMaTics-any thing about bioinformatics
Breaking News
Loading...
X

Like this page!

& Get free Bioinformatics Alerts. Via FB

Contact Me

Friday, 18 October 2013

Data-driven medicine: understanding the link between genetics and disease
















This post published by: Sophie Curtis

Sophie is a technology reporter at the Daily Telegraph. She previously worked for a number of B2B technology publications including Techworld and eWeek Europe.


Source of this article:



Activities that involve gathering vast quantities of data are often portrayed in a negative light, but Sophie Curtis reveals how 'big data' is also being used to identify the links between genetics and diseases such as cancer, diabetes and obesity.

From Google to Facebook and GCHQ to the NSA, every day we hear more about the lengths to which organisations are going to get their hands on our precious data.
Companies want access to data so they can make assumptions and predictions about their customers' behaviour and, in some cases, use this information to deliver targeted advertising. The argument is that the more you know about your customers, the better you can serve them, and the more money you can make.
Government security agencies want access to data so they can gather information about suspicious groups, monitor communications, detect anomalous patterns of behaviour and ultimately catch criminals and prevent terrorist attacks.
Retail, advertising, manufacturing, transport, energy, local government, charity – whichever sector you care to look at, data is playing an increasingly important role in our society and economy. While many people view this as an invasion of their privacy, the majority of this data is anonymous and contributes to genuinely valuable research.
The Human Genome Project, for example, whose goal was the complete mapping and understanding of all the genes of human beings, required the processing of vast amounts of data, in order to determine the sequence of chemical base pairs which make up human DNA.




Although the Human Genome Project was declared complete in April 2003, data-driven medical research is a growing field. The Wellcome Sanger Institute, which was the single largest contributor to the Human Genome Project, is now using so-called 'big data' to investigate the genetic make-up of some of the most common causes of premature death.
As one of the top five scientific institutions in the world specialising in DNA sequencing, Sanger Institute embraces the latest technologies to research the genetic basis of global health problems, including cancer, malaria, diabetes, obesity and infectious diseases. The hope is to one day understand the link between disease and genetics.
The sequencing machines that run today produce a million times more data than the machine used in the Human Genome Project, and the Sanger Institute produces more sequences in one hour than it did in its first 10 years.
For instance, a single cancer genome project sequences data that requires up to 10,000 computer processing hours for analysis, and the Sanger Institute is doing tens of thousands of these at once. The sheer scale is enormous and the computational effort required is huge.
The Sanger Institute has five or six major projects on the go at any one time, using data from a wide variety of sources. For example, one project called UK10K involves sequencing 10,000 individuals from across the UK to compare their genomes.
"Of course it’s properly consented data and it’s then done anonymously. We don’t actually know who they are, we just get the samples and then we look at them," said Tim Cutts, acting head of scientific computing at the Wellcome Trust Sanger Institute.
The data is analysed using the Sanger Institute's supercomputer, which has 17,000 Intel processors and 22 petabytes of storage from DDN and other vendors. This data is then shared within the research community and through the Sanger Institute’s website, which gets 20 million hits and 12 million impressions each week.
"We have to have the ability to access this data very quickly, and your average disk drive from PC World just isn’t going to cut it," said Cutts. "When you’re doing storage at this scale, we have to make sure that it is reliable and happens quickly enough."
He gave the example of Addenbrooke's teaching hospital in Cambridge, which recently had a few cases of MRSA in its neo-natal ward. Samples of the bacteria were sent to the Sanger Institute, which sequenced them in real time and identified that they were the same strain as each other, and that therefore there was a single source of the infection
Addenbrooke's was then able to go through its records and identify where the bacteria had come from, meaning that the problem could be cleaned up and sorted out much faster than the hospital would have otherwise been able to do.
Cutts also pointed to the Sanger Institute's discovery of a variant in a gene called BRAF, which was identified as being authored in 60 per cent of cases of malignant melanoma. This has led to the development of a treatment in just nine years, proving that rapid innovation is achievable with this kind of technology.
"The sort of statistical correlation analysis that we’re doing these days is an absolute classic 'big data' problem," said Cutts. "It’s exactly the sort of thing that market research people will be doing with your Nectar Point data – it’s the same sort of calculation but it’s a different problem."


Enhanced by Zemanta
Subscribe Us

Get free daily email updates!

Follow us!

google+

linkedin

About Author
  • Naveen is a volunteer writer and Bioinformaticist from around the world.He is an addicted Web Designer,Freelance Blogger and Code Extractor,You are most welcome to contact him for any query you may have. Read More

    0 comments:

    POST A COMMENT