The ocean of information available on the web is challenging the standard model of hypothesis-driven science. Yet that model has always borne little relation to the mucky reality of scientific researchby Elizabeth Pisani / November 17, 2010 / Leave a comment
Detail from a data map showing deaths from cholera in London in the 1840s—an early example of “big data” collection
It’s a glorious Saturday morning, a day for kayaking or sitting in a sunny courtyard with a coffee and the silly bits of the FT. But here I am, stuck in a bunker at the British Library with a bunch of Generation Y geeks telling me that my nice, tidy world of hypothesis, experiment and knowledge generation is about to end.
I am attending the Science Online conference, in which the usual scientific lexicon of sample size calculations, placebo-controlled trials and statistical significance tests is nowhere to be seen. The talk is of scraping and mining, terabytes and petabytes, of algorithms. It’s the language of Big Data—the ocean of information being generated by ever-larger telescopes, ever-cheaper genetic sequencing techniques and ever more Facebook users. As Royal Society president Martin Rees has written (Prospect, November 2010), Big Data will allow us to mine and mash our way to unexpected discoveries and insights. It allows us to ask new questions, ones that we couldn’t have asked when science depended on the work of a few people in a single lab working in a limited area of knowledge with just a few gigabytes of processing power. Some people say that Big Data also changes the way that we ask questions. Gone are the days of hypothesis-driven science as we know it. Nowadays, it’s all about pattern recognition.
David McCandless, a mildly geeky writer and designer who runs the blog Information is Beautiful, is making a presentation to the Science Online attendees. He displays a graph that runs from January to December. The line bumps along for the first few months until the largish, double-humped peak in the late spring and early summer. It drops off in the autumn and hits another sharp peak just before Christmas. He challenges the audience to guess what the graph shows. Chocolate sales, perhaps? Greeting cards? With a flourish, he adds the headline to the slide: “Peak Break-Up Times.”
Relationships melt down because of the stress of spending time together over the holidays, McCandless theorises, and the tension of meeting families. The data is gleaned from “scraping” over 10,000 random Facebook status updates for the phrases “break up” or “broken up.” His obvious excitement at this result is shared by an appreciative audience.
Then a woman behind…