Too much information

Good science starts with a hypothesis. But the human genome project didn’t have one
May 26, 2010
The devil is in the detail: a circular computer "scanner" reads sections of DNA at the California Institute of Technology
Ten years ago, the first draft of the sequence of the human genome was heralded as the dawn of a new era of genetic medicine. According to Francis Collins, leader of the International Human Genome Sequencing Consortium (IHGSC) and now head of the US National Institutes of Health, the knowledge gained by the sequencing effort would eventually allow doctors to tailor cures to a patient’s individual genetic profile—a vision he suggested could become reality by 2010. You might have noticed that it hasn’t. The medical impact of the human genome project (HGP) has so far been negligible. Collins’s claim in 2001 that “new gene-based ‘designer drugs’ will be introduced to the market for diabetes mellitus, hypertension, mental illness and many other conditions” no longer seems an inevitable result of decoding all the 21,000 or so human genes. So were we misled? Not exactly. But the distance between promises and achievements reflects the fact that the HGP was, like the moon landings, a triumph of technological capability rather than scientific understanding. It is normal for promises of scientific advances to be slow to deliver. And there’s no question that knowing the sequence of all 3bn of the basic building blocks of our DNA will aid research into human origins and evolution, demographics and disease. One of the technological spin-offs of the HGP is a vast improvement in sequencing techniques, which brought the project to a conclusion sooner (and at lower cost) than expected. This was partly due to the galvanising force of competition between the publicly-funded IHGSC project, and a private effort to sequence the genome by the US company Celera Genomics, led by the entrepreneur Craig Venter. Although we should be wary of emulating one of the justifications offered for human space flight—that it helps us learn about the effects of putting people in space—these techniques have put affordable personal genome sequencing on the horizon, and are contributing to an explosion in genomic data throughout the biological world. But why haven’t these data been translated into new medicines, the real selling point of the HGP? Partly because shifting from knowledge of a disease-linked gene to a viable therapy has proved immensely challenging, even for a disease such as cystic fibrosis that is caused by a single faulty gene. When the genetic component involves a whole suite of genes, as it does for many common ailments such as cancer and heart disease, the task is harder still. The real problem, however, is that we simply don’t understand what a genome does. Needless to say, this is embarrassing—especially for a project that cost some $3bn. It’s as if the Large Hadron Collider had been constructed with only the sketchiest knowledge of the fundamental forces and particles in physics. The HGP was not hypothesis-driven: there were no well-defined questions it was supposed to answer, nor even any real theory to formulate them. Rather, there was a collective delusion that the applied benefits were somehow going to fall out of all the data. This attitude did not stem from the ignorance of scientists, but rather from their mistaken presumption of understanding. Even Venter’s recent announcement of a ‘synthetic’ bacterium with a bespoke, artificial genome made by chemical techniques does not imply knowledge of what how this (unusually simple) genome functions: it was not designed from scratch, but is a stripped-down and modified version of the DNA of a natural microbe. It’s rather like making a car by copying all the parts of an existing one, without understanding how they all interact with one another. To put it crudely, the vision of genetics pre-HGP ran something like this. Our genomes are made up of genes that direct the synthesis of the proteins which underpin the cell’s biochemistry. These genes are embedded in a lot of “junk DNA” that evolution has never weeded out. Translation from gene to protein happens on a more or less one-to-one basis via the mediation of ribonucleic acid (RNA) molecules, which act as templates for protein assembly. Among the revisions that have either gained importance or come out of the blue since the HGP began in 1993 are the following. A particular gene may not have a unique relationship to a particular protein. Most junk DNA is not junk at all, but has a biological role as yet unknown: most of this DNA is transcribed by cells into RNA molecules in an energy-consuming effort that would not happen without good reason. Genes are not necessarily laid out in simple linear fashion on genomes. The activity of genes is affected by many things not explicitly encoded in the genome, such as how the chromosomal material is packaged up and how it is labelled with chemical markers. Even for diseases like diabetes, which have a clear inherited component, the known genes involved seem to account for only a small proportion of the inheritance. The rest has been dubbed the genomic “dark matter”: an admission of total ignorance. To be fair, we can credit the HGP itself, and its associated technologies, with some of this new perspective. For example, a project called Encode set out to discover what just 1 per cent of the genome was actually “doing” using the kind of blanket surveying the HGP made technically possible, and uncovered the transcription of nearly all the “junk” DNA to RNA, which seems to have a role in regulating genes. But the failure to anticipate such complexity in the genome must be blamed partly on the cosy fallacies of genetic research. After Francis Crick and James Watson cracked the riddle of DNA’s molecular structure in 1953, geneticists could not resist assuming it was all over bar the shouting. They began to see DNA as the “book of life,” which could be read like an instruction manual. It now seems that the genome might be less like a list of parts and more like the weather system, full of complicated feedbacks and interdependencies. One of the most pernicious legacies of the “blueprint” paradigm was a new genetic determinism: that we are what our genes make us. Of course we don’t really believe that, geneticists will protest: it’s just a convenient shorthand. But Watson, one of the HGP’s key advocates, said during the project’s inception that: “We used to think our fate was in our stars; now we know, in large measure, our fate is in our genes.” Another unfortunate notion behind the HGP was that science can be done without hypotheses or ideas. As Jim Collins of Boston University, one of the few biologists to see a bigger picture, puts it: “We’ve made the mistake of equating the gathering of information with a corresponding increase in insight and understanding.” So what now? It’s possible that what we still don’t know about the genome is a mass of horrible, intractable detail. But engineers looking at the human body would argue that the robust operation of something this complex must depend on some broad general principles. We undoubtedly know one of them—the relationship between gene sequence and protein structure—but we can no longer fool ourselves that this is the whole story. And we will not discover the others by plunging ourselves into the data ocean of the next “–ome”: the proteome, the epigenome, the metabolome. We need original thinking.

This article has been modified from its printed version