Opening up public sector data is an old geek hobbyhorse. But could the man who invented the web reinvent British government?by Tom Chatfield / January 27, 2010 / Leave a comment
It all began with a lunch. Tim Berners-Lee, the father of the world wide web, was invited to Chequers in spring 2009. A government taskforce had just published a report aimed at making Britain a digital world leader and technological reform was in the air. Even so, Berners-Lee was surprised at what came next. “The prime minister asked me what Britain should do in order to make the best use of the internet,” he told Prospect in early January. “I said, you should put all your government data onto the web. And he said, let’s do it.” A month later, Berners-Lee flew in from his base at MIT in Boston for a meeting, this time a cup of tea with Brown in the garden at No 10. He brought with him his friend and colleague Nigel Shadbolt, a professor of artificial intelligence at Southampton University, who works on next generation web technology and has piloted his work on public data. Sitting in wicker chairs, they hatched a plan for a new government team, led by Berners-Lee, to unlock Britain’s public data.
On 21st January this year, less than 12 months later, the government launched a website to do just that (you may have seen the television adverts). Modelled on a similar effort by President Obama, data.gov.uk brings together over 2,500 public data sets, ranging from abandoned vehicles and A&E stats to child tax credits and carbon indicators. And Brown has promised, in a few months’ time, to open up the jewel in Britain’s data crown: the maps made by Ordnance Survey.
“It can be tricky to explain why Tim’s work matters so much,” says dotcom entrepreneur turned government adviser Martha Lane Fox. “But the data he has been able to release can reorder the balance of power between the citizen and the state.” Such claims are often made for “e-government,” whose hype is traditionally exceeded only by the price tags attached to the (often disastrous) IT projects undertaken in its name. But Berners-Lee’s work has the potential to be different, relying as it does on an unprecedented combination of technology experts, amateurs and businesses like Dr Foster or Experian to take the new information and present it usefully. This helps people make better decisions, underpins information-age businesses, and—because it to some degree redraws the boundary between people and government—may also change the terms of politics. Yet perhaps the most remarkable fact is that it happened at all. Others have tried to unlock Britain’s data, only to run into walls of official obstinacy, vested interest and political indifference. So how did Berners-Lee do it?
If Berners-Lee is best known as the father of the web, he has a parent’s protective instincts towards his child. Born in London, he studied physics at Oxford—where he built his first computer with a soldering iron, television set and spare parts—before working in telecoms and software engineering. In 1989, while working as a fellow at the CERN research centre in Switzerland, he published the academic paper that defined the system of protocols that underpin the global network we think of today as the web. Then, in 1994, he founded the World Wide Web Consortium, a group devoted to keeping the seething mass of pages he helped to create working together. “The whole new field of web science has a lot of excitement about how humanity, connected by technology, should evolve,” he says. “And we have a duty, I think, to make sure that the web does serve humanity.” This belief was clear in March 2009, when he gave a talk at TED, the annual technology conference in California. Declaring 2009 to be the year in which the world should open up its stores of data, he whipped his audience into an unlikely football-style chant of “get raw data now!” And it was this conviction that saw him take his data-philia into a new arena: politics.
Normally, explains Nigel Shadbolt, Berners-Lee “steered well away from government.” So Shadbolt was delighted to be appointed alongside him as a government information advisor. There was, he admits, a certain logic in the move: “Tim built the first web because the scientists he worked with had so many different documents, on so many different systems, and it was just a pain to constantly have to move between them.” Shadbolt himself had begun to think that online public data could even kickstart the next stage in Berners-Lee’s web adventure.
Take one example: information about where you live. Data about local public services is currently spread across any number of websites or is often unavailable. Money flows into schools, recycling centres, hospitals and so on, but information about them is locked away in government files. Put this data online, however, and allow amateurs and businesses to build useful websites with it, and the public will be able to enter a postcode and be presented with information about such facilities at the click of a mouse. As Berners-Lee explains: “The thing people are amazed about with the web is that, when you put something online, you don’t know who is going to use it—but it does get used.” So his pitch to Gordon Brown was simple: if this data—exam results, postbox locations, weather reports, and most crucially, maps—was put online, people would find a use for it. If you build it, they will come.
Three months after meeting Brown, Berners-Lee’s role was made public unexpectedly. It was June 2009, the height of the scandal over MPs’ expenses. Berners-Lee’s appointment was included in Brown’s speech setting out his response to the crisis, couched as part of a push for more government “transparency.”
Meanwhile, inside government, Berners-Lee’s initial task was to convince the cabinet, and for this he needed a case study of how open data could help ordinary people. The story he found came via Nick Pearce, head of Brown’s No 10 policy unit. Before entering Downing Street, Pearce ran the think tank, the Institute for Public Policy Research. During his tenure in 2007, one of the IPPR’s interns, Amelia Zollner, was killed while cycling to work. In March 2009, Pearce wondered to a colleague if they could publish raw data on bicycle accidents. If they did, might someone then build a website that would help cyclists stay safe? Phonecalls to the office of transport minister Andrew Adonis followed. The data existed, and Adonis saw no reason not to try out Pearce’s plan.
As Berners-Lee remembers, “the first data set got put up around 10th March.” Events then moved quickly. The file was promptly translated by helpful web users who came across it online, making it compatible with mapping applications. Then, a day later, a web developer emailed to say that he had “mashed-up” the data on Google maps (mashing means the mixing together of two or more sets of raw data). The resulting website allowed anyone to look up a journey and instantly see any accident spots along the way. It was just the story Berners-Lee needed: “Within 48 hours the data had been turned from a pile of figures into a valuable resource: one which can save lives, and which might help people to pressure the government to deal with blackspots. Now, imagine if the government had done a bicycle accident website in the conventional way. It would have drawn up requirements, put it out to tender, eventually gone for the lowest bidder, the company would have come in, then there would have been a review…” Instead it took two days for raw data in a drawer to become a powerful public resource. (To find it, google “London bicycle accidents map.”)
Put like this, opening up data sounds easy. Yet, historically, enthusiasts have largely failed to convince public bodies to publish more of it. The latest push, 2009’s Power of Information taskforce, was born from a series of secretive ministerial seminars in February 2007, designed to pep up the dog days of Tony Blair’s premiership. A formal review followed, then the taskforce itself in 2008. It was slow work—neither a priority for politicians nor Whitehall bosses. But evidence began to stack up. In the US, a “scores on the doors” scheme put cleanliness ratings in restaurant windows, and cases of food poisoning fell. In Britain, league tables of heart-disease survival rates in hospitals saw mortality rates fall, while data on electricity usage has helped customers cut bills.
Today, the case for open data is becoming ever less theoretical. In 2008 Channel 4 set up 4iP, an internet division led by internet expert Tom Loosemore, designed to do “public service data mashing.” Initial projects include a website to compare schools, which combines exam results, government reports, and even measures on pupil happiness and teacher ability drawn out of Ofsted reports. The site could quickly demystify the decisions, and tradeoffs, inherent in picking a good school. Also under 4iP’s banner, in partnership with the charity MySociety, is Mapumental—a site that mashes together maps with data on commuting times and house prices. Users can answer in seconds the question: “if I have this much money to spend, and want to live this far away from my office, where can I live?” As Loosemore explains, moreover, this is only the beginning: both the schools and houses sites could soon include other data, like crime levels and OS maps, taken from data.gov.uk.
A final example is the site WhereDoesMyMoneyGo.org, by the Open Knowledge Foundation. Its colourful graphics already show exactly how tax money is spent. But if there was better raw data from local government too (which data.gov.uk plans to provide), the site could become far richer: offering, for example, comparative information on how much local councils spent on gritting during this January’s snow.
Boosted by the promise of such sites, and other examples abroad, the case for open data has begun to catch on. Obama has pushed the agenda, while David Cameron is also a convert, linking online mash-ups to a wider argument about his belief in a coming “post-bureaucratic age.” As of early 2009, however, Whitehall itself remained largely unmoved: civil servants had handfuls of excuses for not publishing their data, and no one had a strong incentive to force it. Opening up Ordnance Survey was especially tricky: the organisation tenaciously defended its monopoly status, and gave the government tens of millions in revenue—money that would have to be replaced somehow if its data was to be made free. If he was to change all this, Berners-Lee had his work cut out.
One thing Berners-Lee did have was star power. As Shadbolt puts it: “Secretaries of state and ministers were more interested in meeting him than the other way around.” At a meeting with the cabinet this even brought a rare moment of humour, Shadbolt recalls. Berners-Lee was introduced by the prime minister; Jack Straw then said: “Meeting the man who invented the web is like meeting the man who first invented the wheel.” Ed Miliband shot back: “And what was the wheel man like, back when you met him too, Jack?” It took a little time to restore order amidst gales of laughter.
Berners-Lee and Shadbolt were given offices in a dusty corner of Admiralty Arch, owned by the cabinet office. A civil service team was assigned to help them. The duo initially saw their task as “picking off the low-hanging fruit” of government data. No weighty reports were to be written. Instead their plan was deceptively simple: they would use Berners-Lee’s reputation to get meetings with cabinet ministers, name the data sets they wanted, and publish the results. Before all that, however, Whitehall had to cope with Berners-Lee’s working style.
Most Whitehall computers run on a heavily encrypted network called the “government secure intranet.” But Berners-Lee, used to his own laptop, demanded wireless internet access. He also wanted his team to use an open source project management tool, called Basecamp. But the biggest bone of contention came over Microsoft Word. Much government work is done by civil servants emailing Word documents back and forth. Yet Berners-Lee refuses, on principle, to use Word, which is a proprietary rather than an open source format. On one occasion, one official recalled, Berners-Lee received an urgent document in Word from one of the most senior civil servants—and refused to look at it until a junior official had rushed to translate it into an acceptable format.
Once in the room, politicians were treated to a display of Berners-Lee talking animatedly and waving his hands wildly, seemingly lost in thought. Buzzwords—“metadata tagging,” “the semantic web,” “the web of data”—left the politicians puzzled. Martha Lane Fox recalls a meeting with Berners-Lee and Brown, and the mix of awe and bafflement she felt, along with most of the politicians, when he was in full flow. Yet when faced with Berners-Lee’s demands, ministers usually said yes. There were, of course, difficulties, not least a mentality Berners-Lee tactfully calls “data-hugging.” But data protection laws, at least, could be overcome: public data is usually anonymous, or can cleverly be “anonymised” by statisticians.
The team made progress over the summer but, at meeting after meeting, Berners-Lee and Shadbolt were told by web developers that raw data was not much use without geo-spatial data to go with it. Google maps were not detailed enough. The Ordnance Survey had a reputation for frightening private and public bodies with legal threats if they used data which contained some element of Ordnance Survey’s maps. Berners-Lee had himself thought Ordnance Survey was “not something to deal with in the first instance.” But the tipping point came at a meeting, organised with the Guardian, where web developers outlined their plans to build everything from big sites on public transport to niche services allowing people to see the number of cows in their area. What the pair kept hearing was that “80 per cent” of useful data relied on information owned by Ordnance Survey—whose maps include electoral and council constituencies, building locations, land-ownership boundaries and footpaths. As Berners-Lee recalls, “We were under huge pressure from a large number of people to do something.”
Here, Berners-Lee was being drawn into tricky territory. Looked at one way, Ordnance Survey is an obscure public body packed with geographers. But its iconic orange maps inspire fanatical loyalty, especially from millions of walkers. And people like having their buildings appear on the maps—previous attempts to privatise it had run into tabloid headlines that local churches or community centres could literally be erased.
Berners-Lee himself speaks fondly of them too: “We grew up on our holidays using OS maps to avoid traffic jams, find beaches, walk through the hills.” Even so, deciding they had to act, Shadbolt and Berners-Lee wrote a long letter to Brown, outlining why they needed the OS data. Word came back from No 10: fine, but you must convince the treasury. It was here, at this final hurdle, that previous attempts had always fallen. Where was the money to come from? At a time of fiscal crisis the treasury was in no mood for signing cheques. But Berners-Lee and Shadbolt got two lucky breaks.
The first was an obscure 2007 treasury economic report, led by Cambridge academic Rufus Pollock. Pollock’s analysis argued that there were substantial economic gains to be made from opening up map data to the public—an arrangement that has made the US a world leader in online geographical business. The second was political backing. John Denham, the local government minister who “owned” Ordnance Survey came on board, despite the body’s HQ being in his Southampton constituency. Junior ministers Stephen Timms and Michael Wills were also supportive. But most important was Liam Byrne, the new chief secretary to the treasury, and the man responsible for dishing out cash across government.
During summer 2009 Byrne made a private trip to Washington, visiting the new White House office of “social innovation,” returning impressed by the data projects underway. Byrne had been badgered on similar subjects by Tom Watson, a junior minister who became an open data convert. And at the time Byrne was hunting for big ideas to improve public services and create future economic growth. Open data fitted the bill. The £20m or so that the treasury needed to fill the black hole from Ordnance Survey, he decided, would just have to be found. With Byrne convinced, No 10 came on board. And with backing from the two most powerful voices in government, Whitehall opposition began to melt away.
Berners-Lee’s overall success was partly down to luck. Had the political panic over expenses not convinced Brown of the wisdom of transparency, progress was unlikely. Part of it also came down to an inheritance: obscure bits of work dotted around Whitehall provided evidence Berners-Lee and his team could marshal. As Richard Allan, who chaired the Power of Information task force, puts it: “Berners-Lee is a great man: but he is a giant standing on the shoulders of many determined midgets.” Finally, even in the dry world of public policy it seems celebrity does matter: Berners-Lee unlocked a previously intractable problem because people wanted to help him.
As Berners-Lee leaves government, the stakes remain high. He has long argued for a second wave of his web revolution—one in which information within web pages is classified by a common protocol, allowing pieces of data on the web to talk to each other. This would create much more interaction between, say, computers and mobile phones. This vision, known as the “semantic web,” has not yet taken off. Yet this dream for his precocious child could depend on the type of public data he has been working to unleash. As Nigel Shadbolt puts it, “I’m starting to believe that public sector information could be an incubator for this new data web.” Whether that happens or not Berners-Lee seems content. “It has been an exciting journey. We have had that top-to-bottom enthusiasm, which is something very special. But we are also capitalising on a resource, data, that at the moment is just sitting there. Making it available is so obviously a good thing that it is hard to argue against it.” Put like that, it’s hard to disagree.
The real test will be whether the government makes publication of data a rule, not an exception. This is still some way off. Reversals are possible, especially as Berners-Lee heads back to Boston and politicians move on to other issues. Local government is a particular worry: data sharing is rare among local authorities, although John Denham has asked Nigel Shadbolt to help ensure progress here too. The crucial decision on opening up Ordnance Survey (out to consultation until April 2010) could also still be reversed by a short-sighted treasury. But, assuming that doesn’t happen, what really matters is the applications and websites that follow. And here there is cause for optimism. So far, when raw data has been released it has usually been quickly reused, as in the London cyclist blackspots example.
If, as many assume, the Conservatives win the general election, their enthusiasm for open data—especially from George Osborne—also makes a U-turn unlikely. And it is here that perhaps the most important political and policy consequences of Berners-Lee’s work could lie. Brown deserves credit for finally understanding this issue, and will with some justification claim it as part of his legacy. But a world in which citizens and businesses muck in to create new data websites is equally amenable to Cameron’s vision of a post-bureaucratic age. Publishing raw data, and expecting accomplished amateurs and social entrepreneurs to find uses for it, creates an eye-catching new type of partnership between citizens and the state—what the wonks sometimes call the “co-production” of public goods. There is an important role for businesses too: just look at how companies like MoneySupermarket have prospered by repackaging financial data. But it is the creative citizens and amateurs who tend be most innovative and add the most value, often quickly and cheaply performing functions (like dreaming up websites) previously reserved for government bodies. It is this that has led some, like Martha Lane Fox, to think of open data as the spark for a power shift between citizens and government. If so, these citizens—the new class of über-geeks and “data mashers”—could become as important to good public policy, and to helping citizens make better decisions, as Whitehall strategists or government statisticians. As Tim Berners-Lee likes to put it, the wonder of the web is its sheer serendipity: what someone else, somewhere, has already done can “save your bacon.”
Politically, the idea is far from libertarian. There is still a vital role for the state in collecting, publishing and paying for data, and also in getting the best out of developers. But a world where mashers inherit the earth is also an oddly appropriate example of Cameron’s “big society.” For once, this is an area where those irritating buzzwords—“the wisdom of crowds,” “the long tail,” “nudge,” and the rest—actually work, and where the ideas they enshrine mean citizens taking decisions for themselves rather than relying on the state.
HOW OPEN DATA IS CRUCIAL IN A CRISIS
Just after 9/11, New York’s experts in “geographical information systems” (GIS) assembled a new database by gathering together scattered public and private data, and mashed it together with maps of Manhattan. It helped emergency workers see where to remove debris, identify damaged electricity supplies and water valves, and find emergency shelters. Information about public transport routes and bottlenecks was added, along with paths where fires might spread. Having data instantly accessible, overlaid on maps and in one place, proved a crucial tool for dealing with the crisis. The same has been true more recently in Haiti, where geo-spatial information on the devastation in Port-au-Prince helped disaster relief efforts.