The future will be data-driven

Read more of Prospect's "Data as infrastructure" report

Infrastructure serves a purpose: transport systems provide mobility and accessibility; utilities provide energy and water. None of this is possible without the surrounding infrastructure. But now there is a new utility, so crucial to the efficient functioning of modern life that we could not do without it: data. So much of how we live, and how the economy functions, is driven by data. And just like traditional utilities, it needs the proper infrastructure: to collect it, store it and move it around to the points of use. This needs smart administration, sensors, computer storage, fibre and wireless communication networks, analytics and display. The key point, as with other infrastructure, is to work backwards along this sequence: to ask “how do we want to use data, and what infrastructure will allow us to do that most efficiently?”

It was Claude Shannon in his famous 1948 paper on “A Mathematical Theory of Communication” who first established how to encode information in “bits,” work that provided the conceptual basis for the transmission of data. The ensuing technological advances throughout the 20th century have influenced business and government. But data and processor power have never been as prevalent and powerful as they are today. It is the internet that has transformed how businesses and services work, and how we live: administration, manufacturing, investment and planning, sales and delivery have all changed beyond recognition.

But there is more to come—the fields of data science and artificial intelligence will be transformative in new ways. What has been achieved so far is largely in specialised silos. The challenge is to explore the future of the big systems, for example in science, engineering, health, education and security; and how these combine in the places where most of us live: in cities or city regions.

If we take the growth of computing power and the continued expansion of communications bandwidth as given, there are two parts of data science that underpin future developments in applied domains: data wrangling and data analytics.

Data systems are messy. Take one example: medical data. This can range from the simple, say, a handwritten doctor’s note, through to the highly-complex results of an MRI scan. Data wrangling is the task of digitising all of this and making it available in common formats. In this medical example, this organised data system can be analysed using machine learning algorithms—a series of computerised instructions—and social data on the patient to provide an entirely new level of medical insight into a patient’s condition. Indeed, there are already examples where the algorithm may provide a quicker or better diagnosis than a doctor.

Messy data makes data wrangling a very time consuming but necessary process. It is estimated that this can take 80 per cent of the time in developing data analysis applications. A key research challenge, therefore, is to find a way of automating this.

And once the data has been gathered together—what then? The modern data analytics tool kit contains a broad array of useful instruments. In some cases, this may be a matter of traditional statistical analysis. In other cases, mathematical models can be used to explore different hypothetical scenarios. This is commonplace in, for example, transport planning and retail store location. The ability to test out scenarios in advance saves a huge amount of potentially wasted investment capital, thus freeing it up for use elsewhere.

These capabilities—of data wrangling and analysis—have been combined to produce what are termed machine learning (ML) algorithms. These can be applied to data sets to “learn” routine business processes. So perhaps they might recommend whether to accept a person for insurance and if so, what premium that they should pay each month, and all this without human intervention. ML-based computers become learning machines that can replace, even enhance, human judgement in a range of situations. Ultimately, data analytics provides the foundations of artificial intelligence.

What does this imply for the future development of data infrastructure? The health service provides a good example of what the future could look like. Say, for example, that a patient goes for tests, data is collected, the clinician makes a diagnosis and recommends a treatment plan. Now suppose that all this is fully digitised and recorded. Then add machine learning at the diagnostic stage and combine this with medical and social data. On a large sample, patients can be clustered into “types”—variety of disease, stage of disease, for example—and in each case, the effectiveness of the treatment plan can be evaluated. With this system in place, the doctor can draw on the past experiences of thousands, indeed millions, of previous patients with comparable conditions before making a clinical judgment. The result is that medicine becomes more personalised and increasingly effective. But it all depends on the availability and effectiveness of data infrastructure. Past and often-failed attempts to digitise medical records demonstrate the scale of the challenge.

The medical illustration is paradigmatic: the structure can be applied to any system that is client- or consumer-driven. It can be used to track students through the education system, or to analyse people’s financial needs and circumstances. Advanced manufacturers are linking data-driven demand analysis with robotic production—that is, if they know they will need more stock, automated systems increase manufacturing output accordingly.

However, all of these gains by businesses and other organisations will be made independently of one another. By contrast, the really big systems are all interlinked. Consider the agenda of the National Infrastructure Commission (NIC): this encompasses transport, utilities, telecoms, broadband and even housing. There are huge opportunities for efficiencies between these huge areas of activity, but in order to achieve them their major data infrastructure systems must be linked. How can it be done?

Cities as integrators

The answer lies in the functioning of cities, those colossal generators of data. There is an increasing incentive now for cities to be “smart,” that is, to make best use of their data. The rewards for understanding how to use data as produced on the scale of modern cities are potentially huge, not least in confronting the most challenging economic question of how to improve productivity growth. Cities might yield answers to that question that can be deployed across the country. To tackle that question, we must turn to the key processes of government: planning and investment, both public and private.

As Robert Skidelsky, the economist, pointed out in the September issue of Prospect, government investment in capital projects is necessary to stimulate economic growth in the long term. This is the key to stimulating productivity growth, which is necessary if stagnating wages are to rise. The challenge is therefore to identify the best investment strategies with the greatest chance of meeting the productivity challenge. The analysis of data can also help government and local authorities to face questions on: how services might be delivered with decreasing budgets; levels of investment on utilities and transport; whether densities should be higher; whether green belt should be re-designated. And then there is that overriding and neglected question: how can all of this be done in a sustainable carbon-reducing way? Governance and planning structures are needed that respond to this agenda. The data infrastructure systems are key.

The frustration is that we have the capability to do this—but our approach to the collection and analysis of data is not systematic. What are the barriers and how can we overcome them?

Most cities don’t have good data technologies, or good analytics, and the challenge is to provide an effective capability to local authorities across the country. The current supply of technology to these authorities is very fragmented. Even so, cities such as London and Manchester have at least the beginnings of something good, as demonstrated in the London Infrastructure 2050 study. The big players are well equipped—Atkins, Arup, IBM, Siemens—a substantial list, but the UK remains a weak market, which suggests that the government could consider taking a stronger lead in driving investment. The NIC will be very important on that question. The Turing Institute—the national institute for data science—can also play an important role.

The government’s industrial strategy will also be crucially important. The challenges of providing the data science base for industrial development are substantial—the need for adequate human capital and the corresponding training needs are especially so. Skills are needed at all levels from the basics of coding data to post-doc research. In the Edinburgh region alone, it has been estimated that 10,000 additional data scientists will be needed over the next 10 years. Masters courses in data science are being developed in universities all over the country. Since the field is developing so rapidly, these courses must be integrated with research opportunities and links to industry.

But if Britain does manage to construct a new data-driven infrastructure, this will raise significant worries about data ethics, especially as some of the data will concern people’s most private circumstances, not least when it comes to medical information. Questions include how privacy will be ensured and also anonymity where appropriate. Machine learning algorithms bring with them profound ethical questions. The notion that important decisions affecting people’s lives should be ceded to the cold hard calculation of inanimate machines will cause alarm. It will be crucial to demonstrate the transparency and fairness of these processes—no small challenge.

A company or a government department might use a “deep learning” algorithm, involving many layers of a neural network, to generate a decision say on an insurance policy or a benefits recommendation. At present, it is often not possible to give an explicit account of how the decision was reached and therefore it is difficult to respond to an appeal against the decision. These are major issues and the Nuffield Foundation, the Royal Society, the Royal Statistical Society and the Alan Turing Institute are forming a partnership to ensure that progress is made.

In terms of privacy, hacking will pose a substantial threat (a worry that extends into the wider and rapidly developing field of cyber security.) Making data anonymous is a non-trivial exercise. And once the process is done, it can often be cross-referenced to publicly available personal data—say, on social media—so that identities are revealed. This should be a soluble problem, but it is mathematically a very challenging one.

The future will be data-driven

Lives and economies will be transformed by data science and artificial intelligence. There are challenges in data wrangling as a starting point and in artificial intelligence as an end point. The future is already evident in sophisticated applications in sectors like retailing. There are also signs in the public service sector—especially in health—that the systemic use of data will bring benefits.

But really the greater the degree of integration, the greater the potential rewards on offer and it is this that makes both cities and the government’s industrial strategy so important. None of it will happen without effective data infrastructure.

On the 3rd of October, Prospect launched Data as Infrastructure. This special report grew out of a series of high-level roundtable meetings over the summer which brought together government, private businesses and the third sector to look at how data is already being used to improve people’s lives and how it has the potential to do so much more.

To find out more about how you can become involved in Prospect’s thought leadership programmes, please contact saskia.abdoh@prospect-magazine.co.uk.

You can also download the whole Data as Infrastructure supplement as a fully designed PDF document. To do so, simply enter your email below. You’ll receive your copy completely free—within minutes.

[prosform fields="email,forename,surname" signupcode="Data" countrycode="GB" redirect="data-as-infrastructure-is-yours"]

When you sign up for this free report, you will also join our free Prospect newsletter.

Prospect takes your privacy seriously. We promise never to rent or sell your e-mail address to any third party. You can unsubscribe from the Prospect newsletter at any time

The future will be data-driven

Britain needs a new infrastructure for ones and zeros