Riddled with irregularity

Languages are extremely diverse, but they are not arbitrary. Behind the bewildering, contradictory ways in which different tongues conceptualise the world, we can sometimes discern order. Linguists have traditionally assumed that this reflects the hardwired linguistic aptitude of the human brain. Yet recent scientific studies propose that language “universals” aren’t simply prescribed by genes but that they arise from the interaction between the biology of human perception and the bustle, exchange and negotiation of human culture.

Language has a logical job to do—to convey information—and yet it is riddled with irrationality: irregular verbs, random genders, silent vowels, ambiguous homophones. You’d think languages would evolve towards an optimal state of concision, but instead they accumulate quirks that hinder learning, not only for foreigners but also for native speakers.

These peculiarities have been explained by linguists by reference to the history of the people who speak it. That’s often fascinating, but it does not yield general principles about how languages have developed—or how they will change in future. As they evolve, what guides their form?

Linguists have long suspected that language is like a game, in which individuals in a group vie to impose their way of speaking. We adopt words and phrases that we hear, and help them propagate. Through face-to-face encounters, language evolves to reconcile our conflicting needs as speakers or listeners: when speaking, we want to say our bit with minimal effort—we want language to be structurally simple. As listeners, we want the meaning to be clear—we want language to be informative. In other words, speakers try to shift the effort onto listeners, and vice versa.

All this makes language what scientists call a complex system. This means that it involves many agents interacting with each other via fairly well-defined rules. From these interactions there typically emerges an organised, global mode of behaviour, but this cannot be deduced from the local rules alone.

During the past three decades, complex systems have become widely studied by computer modelling: you define a population of agents, set the rules of engagement, and let the system run. Here the methods and concepts of the hard sciences—not so different to those that physicists use to model the behaviour of fundamental particles or molecules—are being imported into the traditionally empirical or narrative-dominated subjects of the social sciences. This approach has notched up successes in areas ranging from traffic flow to analysis of economic markets. No one pretends that a cultural artefact like language will ever be as tightly rule-bound or predictive as physics or chemistry, yet a complex-systems view might still prove key to understanding how it evolves.

A significant success was recently claimed by an Italian group of researchers led by Vittorio Loreto, a physicist at the University of Rome—La Sapienza. They looked at the favourite example among linguists of how language labels the objective world: the naming of colours.

When early anthropologists began to study non-western languages in the 19th century, particularly those of pre-literate “savages,” they discovered that the familiar European colour terms of red, yellow, blue, green and so on are not as natural as they may seem. Some indigenous people have far fewer colour terms. Many get by with just three or four, so that, for example, “red” could refer to anything from green to orange, while blue, purple and black are all lumped together as types of black.

Inevitably, this was first considered sheer backwardness. Researchers concluded that such people were at an earlier stage of evolution, with a defective sense of vision that left them unable to tell the difference between, say, black and blue. Once they started testing natives using colour charts, however, they found them perfectly capable of distinguishing blue from black—the natives just saw no need to assign them different colour words. Uncomfortably for western supremacists, we are in the same boat when it comes to blue, for Russians find it odd that an Englishman uses the same basic term for light blue (Russian: goluboy) and dark blue (siniy).

In the 1860s, the German philologist Lazarus Geiger proposed that the subdivision of colour always follows the same hierarchy. The simplest colour lexicons (such as the Dugerm Dani language of New Guinea) distinguish only black/dark and white/light. The next colour to be given a separate word by cultures is always centred on the red part of the visible spectrum. Then, according to Geiger, societies will adopt a word corresponding to yellow, then green, then blue. Lazarus’s colour hierarchy was forgotten until restated in almost the same form in 1969 by Brent Berlin, an anthropologist, and Paul Kay, a linguist, when it was hailed as a major discovery in modern linguistics. It showed a universal regularity underlying the apparently arbitrary way language is used to describe the world.

Berlin and Kay’s hypothesis has since fallen in and out of favour, and certainly there are exceptions to the scheme they proposed. But the fundamental colour hierarchy, at least in the early stages (black/white, red, yellow/green, blue) remains generally accepted. The problem is that no one could explain why this ordering of colour exists. Why, for example, does the blue of sky and sea, or the green of foliage, not occur as a word before the far less common red?

There are several schools of thought about how colours get named. “Nativists,” who include Berlin and Kay and also Steven Pinker, the Harvard psychologist, argue that the way in which we attach words to concepts is innately determined by how we perceive the world. As Pinker has put it, “the way we see colours determines how we learn words for them, not vice versa.” In this view, often associated with Noam Chomsky, our perceptual apparatus has evolved to ensure that we make “sensible”—that is, useful—choices of what to label with distinct words: we are hardwired for practical forms of language. “Empiricists,” in contrast, argue that we don’t need this innate programming, just the capacity to learn the conventional (but arbitrary) labels for things we can perceive.

In both cases, the categories of things to name are deemed “obvious”: language just labels them. But the conclusions of Loreto and colleagues fit with a third possibility: the “culturist” view, which says that shared communication is needed to help organise category formation, so that categories and language co-evolve in an interaction between biological predisposition and culture. In other words, the starting point for colour terms is not some inevitably distinct block of the spectrum, but neither do we just divide up the spectrum any old how, because the human eye has different sensitivity to different parts of it. Given this, we have to arrive at some consensus, not just on which label to use, but on what is being labelled.

The Italian team devised a computer model of language evolution in which new words arise through the game played by pairs of “agents”—a speaker and a listener. In this model, the speaker uses words to refer to objects in a scene, and if he or she uses a word that is new to the listener (for a new colour, say), there’s a chance that the listener will figure out what the word refers to and adopt it. Alternatively, the listener might already have a word for that colour, but choose to replace it with the new word anyway. The language of the population evolves from these exchanges.

For colour, our physiology influences this process, picking out some parts of the spectrum as more worthy of a distinct term than others. The crucial factor is how well we discriminate between similar colours—we do that most poorly in the red, yellowish green and purple-violet parts (we can’t distinguish reds as well as we can blues, for example).

When researchers included this bias in the colour-naming game, they found that generally accepted colour terms emerged in their population of agents in much the same order proposed by Berlin and Key: red, then violet, yellow, green, blue and orange. (Violet doesn’t quite fit. The researchers think this is a consequence of how reddish hues occur at both ends of the spectrum.) Importantly, they didn’t get this sequence unless they incorporated the colour sensitivity of human vision, but neither was the sequence determined by that alone—it arose out of the “inter-agent negotiations.”

In other words, there’s nothing in the physiology of vision that would let you guess a priori that red is going to emerge first. And indeed, in the computer simulations there’s initially no well-defined word for red—it is only after some time that a word stably referring to the red part of the spectrum appears, followed later by violet, and so on. Culture—the discourse between agents in the population—is the filter which extracts the labels that are most useful from the biological given of colour vision. So both biology and culture are required to get it right.

The use of agent-based models to explore language evolution has been pioneered by Luc Steels of the Free University of Brussels, who specialises in artificial intelligence; he wants to know how to design robots that can develop a shared language. Steels and his co-workers have also used the acquisition of colour terms as their test case, and have previously argued in favour of the “cultural” picture that Loreto’s team now supports. The computer modelling of Steels’s group deserves much of the credit for starting to change the prevailing view of language acquisition from the influence of genetic factors to that of culture and environment.

Steels and his colleagues Joris Bleys and Joachim de Beule, for example, have presented an agent-based model of language negotiation, similar to that used by Loreto’s team, which purports to explain how a colour-language system can change from one based on differences in brightness (using words like “dark,” “light” and “shiny”) to one that makes distinctions of hue. The brightness system was used in Old English between around 600 and 1150, while Middle English (1150–1500) used hue-related words. A coeval switch was seen in other European languages, coinciding with the development of textile dyeing across the continent. This technology, the researchers say, altered the rules on what needed to be communicated: people now had to talk about a wider range of colours of similar brightness but different hue.

The modelling of Steels and colleagues showed that this sort of environmental pressure could tip the balance from a brightness-based colour terminology to a hue-based one.

It is one thing to tell that story, another to show by computer modelling that it really works in the complex give and take of discourse. It increasingly seems, then, that language is determined not simply by how we are programmed, but by how it is used and by what we need to say.