On Monday, they told you that the Human Genome Project has been completed.
It hasn't.
The gigantic international scheme to decode and figure out the order of every smidgen of DNA in a human cell has covered some serious distance since it began in earnest in 1990. But it's not there yet, even if CNN did devote pretty much the entire day's coverage to hosannas hailing imminent cures for cancer, for Alzheimer's disease, for heart disease, for pretty much everything that ails us. Wolf Blitzer declared that the announcement will affect your health "in the very near future."
It won't. The heads of the projects themselves mention that often. The public project's leader, Francis Collins of the National Human Genome Research Institute, cautions, "There is much left to be done." The private project's leader, Craig Venter of Celera Genomics Corp., said that Monday was "not a very important moment except that it's the beginning of what we can do with it."
Why all this modesty from the masters of an accomplishment that President Bill Clinton likens to Galileo's and Lewis and Clark's, that Tony Blair hails as "the first great technological triumph of the 21st century?" Because Collins and Venter know some things that CNN and its ilk aren't telling you.
For starters, nearly 66 percent of the data in the publicly funded project is in "draft" state, an acknowledgement that the DNA sequences are larded with mistakes. Collins likes to call the human genome sequence "our own instruction book." Well, your new instruction book is full of errors: factual errors and typographical errors. The original plan was to fact-check, spell check and proofread each of the millions of DNA sequences at least 10 or a dozen times to purge the goofs in analysis -- an inevitable consequence when the project comprises more than 3.1 billion pieces of information. But in the indecorous and very public race between the taxpayer-funded and the commercial HGPs, prudence was shed along with good manners. HGP officials, public and private, have settled, albeit temporarily, for half the amount of proofreading they know must eventually be done.
Your instruction book is also missing 15 percent of its pages. In addition to the errors, there are tens of thousands of gaps where the DNA has not been sequenced at all. Public and private scientists alike have waved away the gaps as inconsequential, but that's not entirely true. "It's impossible to answer whether the gaps are important," says Samuel Aparicio, a genome scientist at Cambridge University. "There will be genes that weren't known that will turn up in the gaps, and the gaps have to be closed. But the significance of having 90 percent of the sequence online is to help to show them."
And here's a human genome factlet that hardly anybody ever mentions. The projects have completely excluded a kind of highly condensed DNA called heterochromatin from sequencing. That's about 7 percent of the human genome, and there are probably some genes in it. Not many, but some.
The unfinished state of gene knowledge is such that a few weeks ago genome scientists began to make bets on the number of human genes. Nobody knows how many there are. So far the wagers by the people who are likely to know range wildly from 27,000 all the way to 200,000. The wagering will remain open until 2003 because one thing genome scientists do know is that the human DNA sequence isn't anywhere near finished yet. Even when it is, picking out genes from the mass of other DNA will be very tough. (Genes are estimated to occupy only 3 percent of the human genome.)
On the other hand, the exact number may not matter much. This we have now learned from investigating the genes of other creatures. The mapping and sequencing of the DNA in a couple of organisms has been completed in the last year or two. The simple nematode Caenorhabditis elegans -- that's a tiny soil worm to you -- turns out to have more than 18,000 genes although it possesses a paltry 959 cells. But the fruit fly -- which, although also tiny, is much more complex and human-like than C. elegans -- has some 5,000 fewer genes than the worm (about 13,600). Gerald Rubin, who headed the fruit-fly genome project at Berkeley, points out, "Complexity is not in any simple way related to gene number."
This was a real shocker for the researchers, who are just as anthropocentric as the rest of us. For years they had casually estimated the human gene number at about 100,000. Most of them have now reduced their estimates drastically. At this writing, genome scientists have placed 228 bets on the human number. Half of them have wagered that there are fewer than 54,000 human genes. Two just-published estimates from highly respected genome scientists are in the 30,000s. Collins himself has joined the accelerating trend to lowball guesses; his wager is 48,011. Nobel laureate David Baltimore, perhaps taking his cue from the project leader, told the readers of Sunday's New York Times that 50,000 seemed about right to him.
In short, a great many genome scientists now believe we possess only two or maybe three times as many genes as a worm the size of a pinhead that resides in the dirt beneath our feet. And yet the planet continues to turn.
That lowered estimate has also been a shocker for capitalists who fund the research because they hope to exploit it. Uber-genomicist Venter has reported an encounter -- culture clash is more like it -- with the late Wallace Steinberg, an entrepreneur who backed early stages of Venter's work on the human genome. When Venter speculated in print that there might be only 60,000 or 70,000 human genes, he got a furious call from Steinberg. "What the hell do you think you're doing, saying there are only 60,000 genes?" Steinberg reportedly shouted. "I just sold 100,000 genes to SmithKline Beecham!" (The giant drug company had agreed to a big deal giving it access to Venter's database, which is Celera's product.) To Steinberg, individual genes were a commodity. More was better.
If not gene number, then what makes us so much cooler than worms? Part of the answer may be proteins. Proteins carry out the activities of cells, and making proteins is what genes mostly do. But a gene can be involved in construction of a number of different proteins, each of which can perform a different job. Another part of the answer may lie in regulation of a gene's behavior; controlling what cells it's active in; and when and how much or little protein it churns out.
What this means is that the really interesting part of the Human Genome Project has hardly begun. Most of it still lies, iceberg-fashion, in databases and gene chips and the brains of scientists -- and scientists unborn -- all over the world. Only 7,000 or so human genes out of the hypothetical 48,011 have even been named, and genome researchers know something about the function of only a fraction of those.
The intriguing part, the hard part, the useful part -- figuring out what each gene is up to, and how each gene conducts itself with all the others -- all lies in the future. At the announcement on Monday, Venter predicted that the analysis will take most of this century. Lots more information about what specific genes do must be acquired before the HGP starts paying off in the widespread medical wonders you have been promised. That's the real human genome project. And it won't be complete in your lifetime.
Shares