Numbers game

September 23, 2006
Literary statistics

Did one person write the whole of Shakespeare? Who wrote the books of the Bible? Three researchers at the University of Adelaide have come up with a statistical approach which could help to resolve these puzzles. Matthew Berryman, Andrew Allison and Derek Abbot have published some of their findings on New Testament authorship.

Their analysis is based upon inter-word spacing, defined as the word count between a word and the next occurrence of the same word in a text. So in the previous sentence, for example, the count between the first appearance of "word" and the second is four, between the second and third three, the third and fourth seven, and so on. This is calculated for every single word throughout an entire text.

Now comes the subtle bit. Words are ranked according to how much their individual counts vary around their average. So the top ranked word is not the most frequently appearing one, but the one whose word count is, in a sense, the most irregular. This ranking of words, the so-called "sigma ranking" is how an author leaves a characteristic signature. He or she can be identified by the slope of the line when the sigma ranking is plotted against the logarithm of the ranking.

Testing it on Dickens's Great Expectations and Barnaby Rudge, and Hardy's Jude the Obscure and Tess of the d'Urbervilles, the technique shows decisively that the two pairs of books were written by two different authors. In New Testament terms, analysing not an English translation but the original Greek, the same author appears to have written both Luke and Acts of the Apostles.

Sigma ranking has applications far outside author identification. It has already been shown that words with the highest sigma ranking tend to make better search engine keywords, as opposed to words with high hit counts. And given that DNA sequences can be viewed as possessing a four-letter alphabet (A, C, G, T), it is also being used in genome identification.

NHS 99.995% successful

The Healthcare Commission, which regulates the NHS, has published figures showing that in the year to July there were 41,000 "medical errors" in prescribing medicines and drugs in 173 (out of 259) NHS trusts in England. This resulted in 36 deaths and 2,000 cases of "moderate or severe harm." This is less alarming than it sounds when seen in the perspective of one death for every five trusts, and an average of one case a month of moderate or severe harm. With a best estimate of over 80m drugs and medicines being administered, the failure rate is one in 2,000—or a success score of 99.995 per cent.