Simplicity and the origin of life

When scientists refer to "complex systems," this sometimes creates a barrier, since to many people "complex" means "complicated," and therefore difficult to understand. Neither assumption is necessarily correct. A complex system is really just a system that is made up of several simpler components interacting with one another. The great triumphs of science since the time of Galileo and Newton have largely been achieved by breaking complex systems down into their simple components, and studying the way the simple components behave. In the classic example of the success of this approach to understanding the world, much of chemistry can be understood in terms of a model in which the simple components are atoms. Moving up a level, the laws which describe the behaviour of carbon dioxide gas trapped in a box can be understood in terms of roughly spherical molecules bouncing off one another and the walls of their container. Both systems are complex, in the scientific sense, but easy to understand.

If we strip complexity down to its bare essentials, we discover that it is all built on networks - interconnections between the simple parts that make up a complex system. The general importance of such networks, and the application of these ideas to the emergence of life, has been investigated by Stuart Kauffman of the Santa Fe Institute in New Mexico.

How complexity emerges
Kauffman has a striking analogy which brings out both what we mean by complexity and the importance of networks in emerging complex systems. He asks us to imagine a large number of ordinary buttons, perhaps 10,000 of them, spread out across a floor, plus a supply of thread; buttons can then be connected in pairs by the thread. This is definitely not a complicated system, in the everyday use of the term, but the way you connect the buttons turns it into a complex system. Choose a pair of buttons at random and tie them together with a single thread. Repeat the process a few times, and if you choose a button that is already connected to another button, don't worry about it; just use the thread to connect it to another button as well. After you have done this a few times, there will be a small amount of structure in the collection of buttons. Increasingly you will find that you do indeed choose a button that is already connected to another button; sometimes, you will choose a button that already has two connections, and you will link it to a third component of what has become a growing network of connections.

Each connected cluster of buttons is an example of what is known as a component in a network; the buttons are examples of nodes, points of connection in the network. The number of buttons in the largest cluster (the size of the largest component) is a measure of how complex the system has become. The size of the largest cluster grows slowly at first, more or less in a linear fashion, as the number of threads connecting pairs of buttons is increased; because most buttons don't have many connections already, there is only a small chance that each new connection will add another button or two to the existing largest cluster. But when the number of threads approaches and then exceeds half the number of buttons, the size of the largest cluster increases extremely rapidly (essentially exponentially) as each new thread is added, because with most of the buttons now in clusters there is a good chance that each new connection will link one smaller cluster (not just an individual button) to the existing largest cluster. Very quickly, a single supercluster forms, a network in which the great majority of the buttons are linked in one component. Then the growth rate tails off, as adding more threads usually just increases the connections between buttons that are already connected, and only occasionally ties in one of the few remaining outsiders to the supercluster. Although the network has stopped changing very much as new connections are made, it is by now, without doubt, a complex system.

You can see this sort of effect at work yourself, by actually playing with buttons (50 or so might be a more sensible number to choose than 10,000), but the pattern of behaviour is brought out very clearly using a simple computer model of such a system. The important point is that as the number of connections in the network exceeds half the number of nodes, it switches very quickly from one rather boring state (a lot of buttons with a few connections between them) to another stable state with a lot more structure, but in which there is little scope for further change. This is a simple example of the phenomenon known as a "phase transition," which Kauffman likens to the change in water as it freezes. In the network, complexity has emerged naturally from a very simple system just by adding more connections, with the activity associated with the changeover occurring all at once.

How life emerged from non-life
Kauffman became interested in networks because he is interested in the most important emergent question of them all - how life emerged from non-life. What was it that happened in the chemical "primordial soup," either on Earth or somewhere else, to make some of those chemicals link up into self-replicating systems?

All you have to do is to imagine that in the primordial chemical broth there were some substances that acted as catalysts for the formation of other substances. Chemical A catalyses the formation of chemical B. It is hard to see, given the variety of chemical raw materials around, how this could not be the case, and even if the encouragement A gave to the formation of B was only small, it would still increase the concentration of B in the mix. Now suppose that the presence of B encourages the formation of C, C acts as a catalyst for the formation of D, and so on. Somewhere down the line, one of the compounds involved catalyses the formation of A, and you have a self-sustaining loop of interactions, which in effect feeds off the raw material available and makes more of the compounds in the loop, with the aid of energy from sunlight or the heat from volcanic vents. It is quite easy to see how a network of connections between the chemicals in the broth can arise, an autocatalytic network which sustains itself.

This, argues Kauffman, is the way life arose - as a phase transition in a chemical system involving a sufficient number of connections between the nodes (chemical compounds) of the network. A crucial, and persuasive, feature of this argument is that, like the emergence of a supercluster in the button and thread network, this is an all or nothing event. If the network is insufficiently connected, there is no life, but add one or two more connections and life becomes not only possible, but inevitable. You do not have to build a long chain of unlikely chemical events one after the other in order for life to emerge, and there are no half and half states where you are not quite sure whether the system is dead or alive.

There is a great deal more to the argument, involving details of real chemical interactions, and the need to consider how the raw materials could have got sufficiently close together for the series of interactions to occur. I should emphasise that these ideas are both speculative and controversial. There are still many people who subscribe to the idea of what might be called a "linear" emergence of life, in which chemical reactions in a primordial atmosphere rich in compounds such as methane and ammonia, and energised by ultraviolet light from the sun or by lightning, built up more complicated models step by step. But not everybody is convinced by those other ideas either, and nobody knows for sure just what did occur when life emerged from non-life. One of the most startling discoveries from the fossil record, yet to be explained by any theory of the origin of life, is that living things, bacteria, were present on earth some 4bn years ago, almost as soon as the planet had cooled from its initial molten state and water had begun to exist on its surface. To many, this makes the all or nothing model more attractive than the step by step one. Even in Kauffman's model, though, many details remain to be explored. But the overall package of ideas is persuasive, not least because it shifts the phenomenon of the emergence of life into the same set of complex systems based on simple laws that we find so often elsewhere.

Cells, proteins and the work of life
Kauffman also applies network ideas to the way cells work at a genetic level. Genes provide the instructions that operate what is called the machinery of the cell. The instructions are coded in DNA, the long molecules which make up the genes themselves, but both the machinery and the structure of the body are made of proteins. Things like hair and fingernails, as well as muscle, are forms of protein, as is haemoglobin, which carries oxygen around in blood, and enzymes, which are essentially biological catalysts that promote chemical reactions important for life. Proteins themselves are large molecules made up of sub-units called amino acids. This is why the discovery that amino acids exist in the kind of interstellar clouds from which the sun and its family of planets formed is so intriguing. The genetic code in DNA contains instructions for making proteins, and those proteins then carry out the work of life. But there is another step in the process. When a gene is activated (just how and why is beyond the scope of this piece), the relevant piece of information is first copied into a piece of a very similar molecule called RNA. Then, the machinery of the cell reads the RNA and acts on the instructions to make the appropriate protein. This two-step process is probably telling us something about how life originated, and it is at least possible that RNA was "invented" before DNA.

In Kauffman's scenario, the "crystallisation" of life occurred at the protein level, in a chemical soup rich in amino acids, where the first autocatalytic networks of life emerged. The model accommodates the possibility that RNA was involved at an early stage, and that later the evolutionary pressures associated with competition between different autocatalytic networks could have led to the system we see today.

At the time Kauffman carried out this work, it was thought that there were about 100,000 different genes in human DNA. The human genome project has since shown this to be an overestimate, and suggests there are only about 30,000 genes that specify what it is to be human. All of those genes are present in the DNA of every cell in the human body, but they are not all active. Different kinds of cell are specialised for different tasks, so that liver cells, for example, do quite different things from the cells in muscles.

This differentiation of cells into specialised forms occurs during the development of the embryo, and understanding the process of development and cell specialisation is one of the most important areas of biological research. But however the process happens, in the fully developed human being there are some 256 different kinds of specialised cell. In each case, only the appropriate bits of DNA (the appropriate genes) ever get "switched on" during the normal course of life, so that a liver cell is only ever a liver cell. But all the rest of the genetic information is still there, as has been dramatically demonstrated in cloning, where the DNA from a specialised cell is transferred into an egg cell, which then develops into a new adult that is a replica of the one the DNA came from.

It's a numbers game
The network involved in running a cell can be described in terms of a system with one node for each gene, and connections linking the genes like threads linking the buttons in the earlier model. Each connection can be seen as "on" or "off," leading to a variety of different patterns of activity. With between 30,000 and 100,000 genes involved, you might guess that the problem of describing the behaviour of such a network would be too complicated to solve, even with modern computers. But it turns out that the only systems that behave in a way that is both complicated enough to be interesting and stable enough to be understood are those in which each node is connected to exactly two other nodes. With fewer connections, interesting networks cannot grow, because what happens at one node affects only a few connections; with more connections, everything affects everything else, causing instability.

In these systems with paired connections there is always a finite number of configurations in which the whole system can be connected. This number, it turns out, is equal to the square root of the number of nodes. A system with 100 nodes will be able to run through just 10 variations, called a state cycle (10 being the square root of 100).

With 100,000 different nodes, there will be about 317 different variations. With 30,000 nodes, there will be about 173 different variations. There are between 30,000 and 100,000 genes in the human genome, and there are 256 different kinds of cell in the human body. Could it be that each cell type represents a particular state cycle for the human genome, in which specific genes are turned on and others off? To test this possibility, Kauffman has compared the number of genes and the number of cell types in different organisms. Bacteria have only one or two different kinds of cell, yeasts maybe three, the fruit fly 60, and so on. The number of cell types increases in proportion to the square root of the amount of DNA present in the organism, and it seems a reasonable rule of thumb that the number of genes is proportional to the amount of DNA. The number of cell types does increase as the square root of the number of genes.

Much more needs to be done to turn these provocative ideas into a complete theory of cell differentiation and development. But the story as it is hangs together, strongly suggesting that even creatures like us, the most complicated systems in the universe, are built upon very simple rules. All the apparent complexity of an interacting system of tens of thousands of genes boils down to a few hundred possible states, thanks to the deep simplicity of the world. As Kauffman says, "we are the natural expression of a deeper order."