Say what?

New web software is changing the way we create language
November 18, 2009

Is the word “awesomepants” part of the English language? Don’t worry if you haven’t heard of it; almost no one has. It probably came into existence because “awesome,” meaning really good, has become so over used. In the past few years, enthusiastic types have begun adding “pants”—as in the American word for trousers—as an intensifying suffix. It crops up on Twitter a few times a day.

Does this level of usage make it a real word? Not by any traditional yardstick. It’s not one of the 650,000 or so words in the OED. Yet some time in the past year, one of the isolated uses of “awesomepants” was netted in a lexicographical trawl of the web. The software that spotted the word is busy populating a new kind of online dictionary. It’s called Wordnik and it is the work of Erin McKean, an editor who used to compile American dictionaries for Oxford University Press. Other finds include “smizing”—smiling with your eyes—and “spoofy,” as in spoof-like. Wordnik went online early this year and does not yet work as well as McKean would like. But if her ambitions are realised, it could be the most comprehensive dictionary ever created.

Wordnik has no print edition, so there are no constraints on the number of words it can contain. At the time of writing, “awesomepants” was one of around 4m entries. Some were imported from traditional sources, such as the ten-volume Century Dictionary of 1914. Others come from archives of blog posts and newspaper copy.



Wordnik can do all the things one expects from an online version of a regular dictionary. When I looked up “God,” I got definitions from conventional dictionaries, book excerpts, synonyms, an etymology, an audio recording to guide pronunciation, statistics on usage and the fact that it is worth five points in Scrabble. Wordnik’s handling of neologisms, however, makes it different. When the software finds a word it has not seen before, it adds it to the dictionary and displays the context in which it appeared. New words do not automatically have definitions in Wordnik, but example sentences, for instance from Twitter and blog posts, will be enough for users to get a sense of its meaning, says McKean. “The sad truth—sad because so many people have spent so much time writing definitions—is that most people learn a word better from half a dozen example sentences than they do from a three-line definition.”

Including “awesomepants” in a dictionary might sound alarming. But if the word is being used, albeit by only a handful of people, it is part of English. Of course users need to know that it’s rare slang. But the context quotes give a sense of this, and the site is also testing a function that will show how often a word is used. Once this works, users will realise what “awesomepants” means—and not use it in a CV.

Wordnik, then, could be the next evolutionary step in our use of language. The first dictionaries were written by men who saw themselves as guardians of their national tongues. In the introduction to one early dictionary, A Table Alphabeticall (1604), schoolteacher Robert Cawdrey declares it is intended for “the benefit & helpe of Ladies, Gentlewomen, or any other unskilfull persons”—so that, presumably, they might improve themselves. Daniel Defoe later called for a more comprehensive dictionary, so that it would be “as criminal to…coin words as money.” They wanted to prescribe language, not describe it.

When Samuel Johnson started compiling his dictionary he had no qualms about inserting his prejudices—and wit—into definitions. “Oats” were a “grain, which in England is generally given to horses, but in Scotland supports the people.” Johnson’s favoured party, the Tories, adhered to “the ancient constitution of the state;” the Whigs were merely a “faction.” But the process of creating his dictionary, published in 1755 after a chaotic effort that almost bankrupted him, led Johnson to realise the folly of trying to fix language. Any attempt to “enchain syllables,” he said that year, is like trying to “lash the wind.”

The subsequent history of the profession is a gentle ceding of power—and Wordnik is the latest result. It is the closest we have come to a completely descriptive dictionary. McKean and her colleagues make no judgements about the legitimacy of slang or neologisms. Words will never be removed, however redundant they have become. Definitions rely mainly on context, not the opinions of compilers. In some ways, this makes Wordnik a more honest product than its predecessors. Chambers, the OED, Collins—these are the closest thing English has to a guardian. Words left out of them are somehow diminished, even though size limits, as well as human judgement, bar entry. The internet can remove this arbitrariness: if language is a democracy, now everyone has a vote.