Above: Tim Berners-Lee at TED2009—demanding “raw data now!”
On 5th January, just before the public launch of data.gov.uk, Prospect’s Tom Chatfield spoke to Tim Berners-Lee about how Berners-Lee helped the government to open up public data. On 7th January, Chatfield spoke to Berners-Lee’s friend and colleague Nigel Shadbolt about his role in the project. The edited highlights of both conversations are below.
The full inside story of the government data initiative is told in the February issue of Prospect, and can be read online by subscribers here. A preview of the article, by James Crabtree, can be read for free on our website here.
Tim Berners-Lee in conversation
Tim Berners-Lee studied physics at Oxford, before working in telecoms and software engineering. In 1989, while working as a fellow at the CERN research centre in Switzerland, he published the academic paper that defined what would come to be known as the world wide web. In 1994, he founded the World Wide Web Consortium, a group devoted to keeping the seething mass of pages he helped to create working together. In June 2009, alongside Nigel Shadbolt, he was appointed as an information advisor to the British government.
Tom Chatfield: How did you come to be working with the British government?
Tim Berners-Lee: It began with a lunch at Chequers, when the prime minister asked me what I felt the UK should do in order to make the best use of the internet, and I said, you should put all your government data onto the web. And he said, okay then, let’s do it. [laughs] So when one has spent a lot of one’s life persuading people to put things onto the web, and persuading people to be open, it’s almost disarming to have somebody say that straight away. The result of that was a team in the Cabinet office of a team under Andrew Stott. Various people in the UK government had experience of this area already, so it was a question of how to accelerate this as much as possible.
TC: I know it’s sometimes felt in the world of online innovation that governments are not the best people for creating radical change. What motivated you to put this effort into working with and within the British government?
TBL: The neat thing about this is that there is such a clear win: the data we’re talking about has already seen a lot of effort expended on creating it. It’s a valuable resource that has been produced by parliament for a particular purpose, and it has typically been sitting there as a really valuable but under-used resource.
The thing people were amazed about with the web itself is that when you put something online, you don’t know who is going to use it. You’re looking for something and you think it’s impossible that somebody will have done it before, but you find that they have, and the web saves your bacon. It’s the serendipity—the unexpected reuse that is the value of the web. When you move to data, suddenly this is not applied because data actually is really desperately boring when you look at it by itself. When you put data together you can derive very powerful new insights; so I think that realisation that the UK had got all all this resource that was under-utilized means the arguments become very obvious for putting them out there for people to re-use.
TC: You’ve been working in this field for some considerable time, but is this the first time you have worked in this way with a government?
TBL: I have encouraged governments on various occasions to adopt standards and to use them for websites, but this is the first time within government. What happened is that at the beginning of 2009 I decided that this is going to be the year in which I ask people to put data on the web. I gave a talk at TED, including getting people to chant “get raw data now.”
TC: Part of the interest of the story seems to be that from this first encounter with the prime minister there was a degree of serendipity involved: the time proved incredibly ripe in government for this initiative to gather a huge amount of momentum over a very short time in government terms.
TBL: People had different points of view coming from different places, but the consistency of the encouragement that we’ve had has been very gratifying. And I suppose this is really my first experience with getting involved with policy so I don’t have very much to compare it with, which means that my impression is that if people really do want to do something, and they are excited about it, then they can. I think we have to be very careful about having a burst of apparent momentum, though, and a lot of talk; we have to realise that it is going to mean a lot of pushing of other people, staying late for a bit, putting extra effort in maybe to go to a seminar to learn how to do things with linked data. I think that within departments there is bound to be resistance from a few individuals, who are going to have to be coaxed around it.
One of the really important things of course is that we should do all this without changing the way people work. It needs a very concerted ongoing push at each level: managerial, within departments, the material level, and the grass roots. This stuff happens because people are dedicated and put a lot of time into doing it. And then it’s great to celebrate the results: things like these “hackathons” which we had a couple of, where people got together and made visualisations of the existing data. Those have been good because they give you a good sense of the data being made into something really interesting.
TC: Did you feel that people really grasped what you were talking about, intellectually, the principles and ideas?
TBL: Yes. Obviously from the point of view of the timing it connected to a concern about transparency in government, which is not just the UK, the US is also very concerned at the moment; some people connected it directly to that, and that may have helped, the sense that this was holding the government accountable. One of my fears is that people would see that as the only motivator and they wouldn’t see that we also have an enormously valuable resource.
For instance, somebody blogged on DirectGov, putting up some data which was just the grid references and years of bike accidents over three years. And that was on 10th March, I think, and later on that day somebody pointed out that it had been put up in Microsoft Excel, and said you shouldn’t do that, it’s a proprietary format, you should put it up as a comma separated file which anybody can read, and by the way here it is turned into a csv; and then someone says they’ve turned it into a kml file which can be used with a mapping application; and then the next day someone from the Times blog says I have done the mash-up, so here is a map you can go to and zoom all over the location and find your journey to work and see where all the bike accidents have been and maybe modify your journey to take another route. That was within 48 hours: the data had been turned from a pile of figures into a really valuable resource, which can save lives, which perhaps can help in the long term helping the public put pressure on the government to deal with black spots, and that is immediately useful to anyone getting on a bike.
Now imagine if a government department in any country had decided they were going to have a bicycle accident website. They would probably have spent a long time drawing up a requirements document, put it out to tender, and eventually gone for the lowest bidder, and after a certain amount of time the company would have come in, and then there would have been a review, and eventually the site would have been launched, and with luck it would have been useful; but in fact the message is that there are people out there who are prepared to put the effort in to turn data around before you have gone to the trouble of doing it yourself. It’s about seeing whether the mash-up-sphere, if you like, will do it for you. And that sphere will always win because they have access to data from different departments and non-government sites and all kinds of things. Somebody who is out there mashing up data sources, or someone in government doing that, is always going to produce things that go far beyond one single data set.
TC: Coming in from the outside, how did you find the internal working practices of government?
TBL: The people I met were generally very switched on, and I have been very impressed with the way that people in the Cabinet Office have made things happen and explained to me how things work. Yes, people tend to send around word processor files in emails, where at W3C everything is on the web. The British library has I think one of the largest public wi-fi areas in the country, possibly in the world, but the government doesn’t have open wi-fi so one has to work around that, you can’t just open your laptop and be connected. But I wasn’t there to complain or worry about that, and of course there are an awful lot of industries in the world that still operate by sending around copies of a document via email.
TC: One of the key things seems to be the Ordinance Survey data. As I understand it, you went in thinking that OS would not be your prime focus, but it ended up becoming a key component. Was there a eureka moment with OS data?
TBL: My initial feeling was that OS had a complex history, that the whole set-up for OS as a trading organisation was defined, so that it was not something to deal with in the first instance. But so many people we came to, who deal with data of almost any variety, said that government is to do with the government of the country, the place, and almost everything you do is to do with some physical place on a map. We met so many people who were very constrained by OS data or had a governmental right to use the data but as a result couldn’t pass on the information they created to the public, so we were under a huge amount of pressure from a large number of people to do something.
TC: I know historically it has proved very difficult to open up the OS data. Did you find there was a lot of opposition to that?
TBL: There had been various attempts to review the problem, some of which had been very conservative and focussed on small changes to the model, but there was one report that focused on the economic side and said very strongly that everything should just be made public.
TC: Rufus Pollock’s report?
TBL: That sounds like it. Basically, the ideas in the report are correct. But one of the problems is that the value to the individual citizen of having the data available, that return on investment, is very difficult to measure. How can you measure the value in your life of the web: how do you put a pound sticker on it? You can’t. You can try in some ways, you can say you would have wasted this much time going down to the library which I don’t now do, but the whole thing is wrapped up and you do things now that you didn’t used to do. But this is one of the problems.
One of the things that was very important to us was to preserve the OS. A lot of us remembered at school being taught to use an OS map: we grew up on our holidays using OS maps to avoid traffic jams, find beaches, walk through the hills; and there are a huge number of people in Britain who are very attached to the Ordinance Survey and who value it, they know the OS as the people who make their maps, and are a jewel in the crown of Britain’s information resources. A lot of it is sentimental attachment too, to particular maps and the particular way they are presented.
TC: What are you worried about? Are there some great obstacles remaining—and is it possible that a different government might not care for the agenda so much?
TBL: The whole openness of data thing is so non-partisan that I can’t see a different government really wanting to reel back the openness. I think that once people have seen it, too, it will be easier to see that nothing horrible has happened. The fears that people tend to have when they are managing a particular bit of data sitting at a computer are to say, well, I’m worried that people will misinterpret the data, I’m worried that people will use it for the wrong thing, or I’m worried that people will think it’s more accurate than it is. Those are the sorts of things that you hear, from the very large standard excuse set. But once the data is out there those sort of excuses won’t be there any more, because people will say, well, the data is out there and it’s not really being abused, and people do understand that it’s not very clean data and that nobody is perfect, but they are very grateful to you for making it available.
The things that I am concerned about: we need to keep the momentum going, and ensure that people are following up with data sets. There is also a temptation to mail out a DVD and say, okay, here’s the data, but obviously the data is changing and being made open is just part of the cycle of the development of data; we have to grow to learn how best to do this.
TC: Is this an area that Britain could lead the world in, or are there other countries we should be looking to and trying to emulate?
TBL: America is also at it, of course, with its data.gov site.
TC: They seem to have less emphasis on making it highly usable for developers and third-party APIs.
TBL: That’s right. I think when it comes to the quality of data and making it usable, Britain is ahead. Of course it is early days. There is an awful lot of data out there: in both countries there has been a call for a list of the things that are out there, and I think just producing that list of what data to call for is quite difficult. We need to move to an ethos where if somebody in government creates a database then by default they will create a path to making that available and usable publicly. There are different ways in which the UK and US can learn from each other. There are also efforts go on in other countries—Australia, New Zealand, Toronto, New York, the State of Massachusetts, all have public data projects.
TC: Is this a movement that could change the way people think about politics and interact with political systems?
TBL: Yes, I think it will have a big effect: the accountability of government and transparency will have a very healthy effect on the way that government is run.
I felt initially that we clearly we needed to do this with the most developed countries, who understand about putting stuff on the web. But people are also pushing the idea of this in developing countries, because that’s where government and data transparency is needed, and you really need to establish trust in the government in order to justify investment from outside for example. When I was recently in Uganda, talking to ministers and the prime minister there, I took the opportunity to mention the openness of data in Uganda, so it may be that some of the most important effects that you find early on actually come from developing countries.
TC: Do you see yourself having a long-term involvement with the government, or with governments, on this?
TBL: This has been a project of a certain length of time. I hope that the momentum that it has got will be self-driving, I hope this will take off exponentially, and that I will be able in future years be able to push other sorts of things. What should it be in 2010—should it be the year of scientific data, social networking data? There’s a lot of ways in which we have to go in how we use the web, and they all connect together. But putting government data on the web has been a very exciting journey. We have to keep pushing, though. Constant vigilance.
TC: And what about your personal motivation. Obviously you’re very driven: it would be quite possible for you to sit back if you wanted to.
TBL: It is very exciting, clearly, to make things that work and that allow computers to do things that help us. The whole new field of web science, learning about how the web as a very large system and how humanity connected by technology should evolve, has a lot of excitement. But there is also a certain amount of duty. The web is this big system which we did actually make, this artificial system created by the people who sit down and write protocol and machine specifications. The way computers interact over the web is defined by a system we invented and can change. We have a duty to make sure that at the same time as we are putting data on the web, that we look more broadly to make sure the web does serve humanity, and think about the 75 to 80 per cent of the world who don’t use it at all at the moment. As always, too, I’m motivated by meeting people who are also very fired up: people like Andrew Stott, who are excited about doing something that is going to be good and effective.
TC: What most concerns you, and most excites you, about the future of the web?
TBL: Most of my concerns are to do with the web being controlled by one party or one group, whether government or a large company that has got excessively powerful and is able to control what one sees or know what one does; if a government decides it is going to control or limit what people do, or spy on them. Those are the main fears.
Nigel Shadbolt in conversation
Nigel Shadbolt is Professor of Artificial Intelligence (AI) and Deputy Head (Research) of the School of Electronics and Computer Science at the University of Southampton. He is a Director of the Web Science Trust, and of the Web Foundation—organisations committed to advancing our understanding of the web and promoting the web’s positive impact on society.
Tom Chatfield: Where did this begin for you?
Nigel Shadbolt: I moved down to Southampton University in 2000. The whole area of the semantic web seemed to me really exciting and I was fortunate enough to lead a project started in 2001 called Advanced Knowledge Technologies or the AKT project. It involved 5 universities, really looking to try and build technology and methods in anticipation of more and more information management happening on the web. As you’ll probably be aware, Tim has an appointment at Southampton. I was talking to him at a web conference in New York in the mid-2000s and he was saying, look I really want to try and get the semantic web, the ideas of linking data to the web, firmly established in a European context and you guys at Southampton with this AKT work are really showing what the art of the possible is. That’s really where our contact, our personal collaboration, started.
What really happened then was one piece of serendipity. John Sheridan, who was at that time working for Carol Tullo at the Office for Public Sector Information, had approached me and he had been terribly interested in the idea of using these techniques for the linking of and the publication of public sector information. This was back in 2004—and we decided halfway through the AKT project that public sector information could be a really interesting test case for us because it was a perfect experimental environment for exploring our techniques.
Everybody knows about data protection, but few people know that there is also a high level, legally binding directive to make public sector information available freely across Europe. It seemed like a good context, so we launched a pilot project called Active PSI, and that was taken up by the OPSI people and they started to push it around departments, local authorities and so by 2005 or so, 2006 maybe, these experiments were being reported into parliament in the Ministry of Justice documents talking about possible use of this technology.
So we had a community within the civil service who were exploring these ideas and seeing utility—and then we had Tim, who could go out and command huge respect to evangelise this new approach for semantic technologies.
TC: His is a name that can make things happen.
NS: Yes! And it really does make a huge difference. If you remember when the first web really took off, the first place it got wider adoption was amongst scientists and CERN who were using it to share information. That’s why Tim built the first web, because they had so many different documents on so many different systems, it was just a pain to constantly have to move between them. So they built this layer that would sit across them. That was, if you like, an incubator group for this version of the web. And I’m starting to believe that public sector information could be a similar incubator for this new data web that we’re seeing emerge.
TC: And you and Tim were appointed as government advisors in June 2009.
NS: Yes. It’s an interesting category, “advisor.” What we did with our terms of reference was two things that were interesting. One, make them very specific about deliverables. This would of course have to ultimately involve policy and hard policy decisions—but it was also about a single point of access where the information assets would be catalogued and described—that is the concept of data.gov.uk. Also, the other thing was that we were going to build this thing using what people referred to as “agile” programming or project management, where we really are building small but fast.
TC: So you’re bringing in best practice from the IT and academic industries?
NS: Yes, trying to do stuff fast but sustainably, using open source where we can.
TC: I presume you’re quite adept at avoiding what we might call the public sector traps of taking a very long time to get terms or batting around committees before you begin to get progress…
NS: Yes. And that is why, so far, I think things have really moved quite quickly.
One of the really fascinating things about this work is that our claim is that you can never anticipate the services that will actually turn out to be the killer apps. You always recognise them after the fact, but never understand them before, and that is precisely because human ingenuity always outruns your idea that you should give people this service or that service. We must try to turn the service provision mentality on its head: provide the basic raw material for services and the services are the creatures of people’s commercial and social and voluntary efforts.
TC: If you want to offer people something transformative you cannot possibly plan for that: you just create the conditions in which the unexpected is able to arrive?
NS: I actually think that’s the key to a lot of innovative processes. There’s a continuum of usability here which is very hard to appreciate: when you are making access to data restrictive or expensive you don’t see the long tail of applications that will naturally arise on the web. Lots of applications will be used by a few people but because it’s easy to build those things and free and unrestricted to do so it’s worth people putting the effort in.
TC: I know the Cabinet Office people are very much up to speed, but are there notable data-huggers around or potential sticking points?
NS: There will be other key data sets for us that we’ll have to get resolved and some of these are historic. This gets us onto policy, which I think is interesting. We can do the technical piece but part of the terms of reference was to look at policy and regulatory process. One thing that’s happened is that when the rail and transport franchises were let, they let the information go with the franchise at the level of something like your railway timetable. What it means is that the train operators have built a little iPhone app to tell you about train delivery and timetables and if anybody else tries to take timetable data and publish an app they are told to cease and desist.
Information about where the bus stops are, when the buses run, when the trains run—we would assert if they are paid for by the taxpayer, this is public information. And there the question is that it’s not that the operator is doing anything particularly wrong, it’s just that the information went with the franchise. As these things get renegotiated or as we look forward we have to think about unblocking and keeping information available for unrestrictive use in these contexts. We’re serious about this and certain classes of information need to be in that space. So sometimes policy has to think a little bit about where the information should or shouldn’t reside.
TC: Will everything on the data.gov.uk site be up to the linked data standards?
NS: Not everything in the first instance, because there is a lot and we can’t convert all of that immediately. But we’ve got lots of willing hands who are willing to do that and there will be a very interesting role for crowd-sourcing in this area.
TC: And what is working with Tim like, as a colleague?
NS: Well, we’re friends as well as colleagues so it’s great to have a project that we both feel so passionately about absorbing our interest. You asked, where is Tim’s commitment to this—I think in this area the thing to say is that he’s extremely proud of being a British subject, and who feels pretty strongly about this country and this country being a good place to work and be in and support. And it so happens that this is a really good crucible for a large part of what he believes we need to do for the evolution of the web.
Also, I’ve been working with Tim on our other pet project, to try and get the discipline and study of this thing we call “web science” off the ground. Web science is really the view that says, look, the public sector project is one manifestation of a huge and much bigger challenge, which is that the web is now a complete ecosystem that’s changed the world and we don’t really study it as a complete system, and we need to think about doing that.
TC: And we need a rational, analytical basis for doing that rather than just vague speculations
NS: It’s not just engineering, it absolutely does require input from sociologists, economists, lawyers and such-like. The web is humanity connected because it’s people and advanced information processing systems; you’ve got this really interesting ecology evolving. Can we design for it, can we anticipate it, can we understand it?
Tim and I wrote a Scientific American article together about a year ago that’s got a nice summary of what this web science is and refers to the semantic web. What I would say is that working with Tim is a delight. He enables, he gets listened to, and of course opens doors. That’s the first and obvious point. But he is also passionate about it, he cares about it, he is technically extremely and extraordinarily close to developments—so he really does care about standards, it isn’t that he just sits at the top. And working with him is very much like working with an equal, he really does understand that in terms of trying to make this work in a UK context.
TC: And I think that many people in a similar position to him might feel that the cutting edge was a long way away from governement.
NS: My group did the first linked data pilot with OPSI in about 2004 and over the next couple of years he kept hearing me say this public sector information is much more interesting than I expected, and he would say yes it’s interesting, because it has so many ingredients that really work as the data web incubator. I think that’s the opportunity we both saw, that actually, surprisingly, this is a place to push and with everything around transparency, accountability, that’s now in the public context as well…
TC: …and it’s in this area you can deliver the decisive proof of the power of these ideas. I was absolutely struck after seeing him at TED, and having spoken to him, by just how driven he was; considering he could spend the rest of his life collecting honorary degrees. He seems to be extraordinarily disciplined, by any standards, about how he spends and allocates his time and his energies.
NS: I think the real thing is that he’s absolutely ruthless about that—in terms of mere journalistic vanity or air-time on the TV, that just doesn’t count. And in fact he’s notoriously hard to get an interview or a quote with.
TC: And how does he tolerate fools? I guess there must be frustrations.
NS: I think it’s very striking actually that he will often devote a large amount of time to what seems to be at first quite a simple-minded question but actually is touching on something pretty basic. Tim has been around, pushing the web before the web, and as it really took off has heard so many of the standard misconceptions or worries or genuine concerns that he has acquired a very deep feel for where issues can arise.
I think, when it comes to obdurate stupidity, there he’s got a pretty good line in saying “you know, you’re just wrong… you’ve just got your nose in the wrong place!”
TC: What about the future of the world-wide web itself as more people around the world come online?
NS: There are some critical properties of the web which we want to ensure and endeavour to protect. We talk about universality of access and non-proprietorial standards, these kind of issues. There are always forces who would love to lock information up behind walls, because that gives them a natural monopoly and they can charge a lot of money for it, or for various reasons of control or power. And then there’s the dark side of the web, the way in which it can be subverted for crime or a host of other darker things; but that’s humanity. In a sense, the bit that we have to argue for and push for and be on our guard for is the fundamental building blocks: are the standards open, are they genuinely accessible by all.
The majority of the world is still not connected, too, and there’s a huge issue there about disenfranchisement for those who aren’t. So it all adds up to me again to an issue for us in web science, which is, could it [the Web] fragment? There’s no natural reason why the system should endure and persist in the way it is at the moment. But you really would be alarmed if we were back in the bad old days of serious information disaggregation.


Share
Comments
Print







