Was Michael Gove right? Have we had enough of experts?

Listen: Paul Ormerod on the 12th edition of our monthly podcast, Headspace

Using evidence to assess the outcomes of policies is a vital part of good governance. Whether it is examining how a Budget will affect those on low incomes, or how well fishing quotas are managing stocks, no one but the most bumptious ideologue would deny it. The plastering of demonstrably dodgy statistics on the side of the Brexit battle bus last year stoked indignation on the part of many who think of themselves as rational and well-informed. The arrival of Donald Trump, an American president who feels no compunction about disseminating falsehood, has further darkened the mood among the liberal intelligentsia. There is a strong sense that the forces of reason must now rise up and see off the purveyors of the “post-truth” world.

We must, however, also grapple with one other contemporary reality. Underlying the great turmoil of politics at the moment is precisely the view that “the experts” are less trustworthy and objective than they purport to be. Rather, their considered opinions are seen as a self-reinforcing apparatus for putting themselves beyond challenge—to advance their holders’ status, their careers or, most damaging of all, their political views over those of the less-educated classes. The great popular suspicion is that an elite deploys its long years of schooling and “the evidence base” to make itself sound more knowledgeable as it rationalises the policies it was going to prefer all along.

Is that a fair charge? Well, that is an empirical question, and definitive evidence for answering it is in short supply. What we can usefully do, however, is interrogate where the “evidence base” comes from, and how solid it is.

"Agreeing to referee academic papers yields neither monetary reward nor esteem, but it subjects you to a range of human temptations"

Back in 2010 we wrote a piece arguing that an over-emphasis on empirical evidence in political rhetoric was alienating the public. The increasing reliance on the expert stamp of authority was eroding a sense of shared values between governors and the governed. Unless you were familiar with the latest nuance in academic evidence, we warned, you were automatically unqualified to have a valid opinion.

We thus see the current defenestration of experts as a reaction to long-term trends in public life. If it is true, as Michael Gove said during the European Union referendum debate, that people “have had enough of experts,” it is because empiricism locks non-experts out of discussions that impact on, but may not capture, their day-to-day experience. Last year, many members of the public formed an impression, whether fairly or not, of experts attempting to settle an important and emotive matter over their heads. A fault line between “the people” and those who think they know what’s good for them, which has been there for some time, became apparent. The June election was another reminder of this, as certain policies that many experts felt didn’t stack up—universal pensioner perks, free university education for students and costly nationalisations—turned out to be rather popular.

As paid-up members of the quantitative-expert class we share some of the current foreboding that a dystopian future awaits, where objective truth is not respected. There are many good examples of evidence influencing policy. But there are bad examples too, and if deference towards expert opinion goes too far, democracy ceases to operate as it should. Experts may see it as their role to uphold truth, facts and evidence, but they can only do so if they maintain public trust. That implies many things—better communication, for example. But before anything else it implies experts adopt a reflective approach to their own work, and open it up to outside scrutiny too.

There is a particular onus on social scientists here, because there is often more subjective judgment and interpretation in their fields than there is in measuring physical reality, leaving more scope for biases to entrench established views. Many social scientists are meticulous; but there are others who need to get their own house in order where “the evidence” is concerned. If it is going to be used to close down arguments it needs to be rock-solid, but how often is that the case?

Scarcely a day goes by without the press featuring some research, polished by a university PR team, purporting not only to establish that sausages cause cancer or that the people of Basingstoke are happier than the people of Burnley, but also that Something Must Be Done About It.

Academic papers from the social sciences and health are now an important foundation of what has come to be called the “evidence base.” Who could be against evidence? But this is a rather telling phrase. The “base,” when you stop and think about it, is logically superfluous; its function is purely rhetorical—suggesting that the evidence in question (unlike any other) is rested on something that shores it up. But, as we shall see, there are often question marks around its solidity, especially in the social sciences.

The magic concept invoked to define it—and to separate the priesthood from the laity has become that of “peer review.” Peer review is the process by which submissions to academic journals are scrutinised by the academic peers of the authors—the “referees.” Only papers deemed suitable by referees will be published.

The process of this scrutiny, of peer review, may conjure up images of scholars carefully examining the article line by line, checking every single piece of analysis verifying its claims. Very occasionally, this Platonic ideal may exist. When, for example, Andrew Wiles claimed to have proved Fermat’s Last Theorem, his manuscript was subjected to the most thorough investigation imaginable by the world’s leading experts in the relevant areas of maths. An error was indeed discovered, one which Wiles was happily able to fix after months of wrestling with the problem. As a result of the peer review process in this case, we can be entirely confident that Wiles proved Fermat’s Last Theorem.

In almost all other cases, at least within the social sciences, the reality of peer review is rather different. We should think of a harassed academic, pressured by the need to do his or her own research, by the demands of both students and the university administrators and being pestered by the journal editors to submit the review.

Refereeing is both unpaid and anonymous. The referee receives neither monetary reward nor the esteem that comes with getting one’s name in print. The task is seen as a tedious chore, and procrastination is widespread. In the social sciences, there are frequently delays of a year, and more occasionally two, between submitting the manuscript and receiving the referee reports.

One might ask why academics agree to referee papers at all? In part, it is convention: it is simply part of the everyday life of being an academic. But once a year many journals will publish a list of the names of their referees. This incremental addition to your CV, just might—perhaps, eventually—be part of the package that lands you a promotion or a job at a better university.

But serving as a referee under these conditions subjects you to a range of human temptations. Does the paper support or undermine one’s own work, for example, or does it appear to be written by a rival? Does it cite enough of the papers of the reviewers and his or her friends, because the number of citations of your own work by other academics is an important metric by which you are judged? Here, at long last, is the chance to slap down, under the cloak of anonymity, the smartarse who slapped you down at that conference five years ago.

Then there is the question of who chooses the referee. Enter the editorial board, which is made up—once again—of academics typically paid little or nothing. Again, the human factor creeps in. Years ago, one of the present authors submitted a paper to a leading American economics journal, a critique of a published article that had gained a certain kudos. One of the authors of the criticised piece was an editor at that journal—and, as was discovered by chance a few years later, he gave it to his co-author to referee. Needless to say, the negative article wasn’t accepted.

Once a paper is published, the chances of it being subjected to further scrutiny are remote. A tiny handful of articles become famous, and are downloaded thousands of times. Many receive no attention, and most will be read by very few scholars. Yet the mere fact that a paper has gone through peer review confers on it an authority in debate, which the lay person cannot challenge. So, all too often, there is no post-publication challenge within the academy, and no licence for challenge from outside. Locked out by the experts, some laypeople may start to feel like they have had enough.

So how might we improve peer review, and build “the evidence” on a firmer foundation? Economics has rightly been subjected to many criticisms, especially since the financial crisis. But the discipline has one extremely powerful insight, perhaps the only general law in the whole of the social sciences: people react to incentives. They may not always do so with the complete rationality described in economics textbooks. But thinking through the rewards on offer in any given situation helps to understand why people behave as they do.

Ideally, the incentives around research should be structured so as to maximise constructive scrutiny of every claim that is made. Instead, the rising pressures on academics to publish has created a set of incentives that exacerbates the need to negotiate the peer review process and appear in academic journals. The rising demand to publish has been met by a large increase in the supply of academic journals. One recent estimate is that there were 35,000 peer-reviewed journals at the end of 2014, many of them of decidedly doubtful quality. Why? Because incentives are everywhere.

A paper in the 23rd March edition of Nature by a group of Polish academics mercilessly exposes the problem. The title neatly captures the content of the article: “Predatory journals recruit fake editor.” The authors begin in an uncompromising manner: “Thousands of academic journals do not aspire to quality. They exist primarily to extract fees from authors.” They go on: “These predatory journals follow lax or non-existent peer-review procedures… researchers eager to publish (lest they perish) may submit papers without verifying the journal’s reputability.”

They adopted the brilliant strategy of creating a profile for a fictitious academic Anna O Szust, and applied on her behalf to be an editor of 360 journals. Szust is the Polish word for “a fraud.” Her profile was “dismally inadequate for a role as editor,” yet 48 of the journals offered to make her one, often conditional on her recruiting paid submissions to the journals.

This new study follows on from a 2013 piece by the journalist John Bohannon in which his purposefully flawed article was accepted for publication by 157 of the 304 open-access journals to which it was submitted, contingent on payment of author fees. That was a warning sign, and things have got worse since. The Nature authors state that “the number of predatory journals has increased at an alarming rate. By 2015, more than half a million papers had been published in them.”

"Once a paper is published, the chances of it being subjected to further scrutiny are remote"

None of this means that academic journals have moved into a post-truth world. There are clearly journals where high standards apply. The Polish academics approached 120 journals on the respected Journal Citation Reports directory as part of the 360 in their experiment. None of them accepted “Mrs Fraud” as an editor. And one can imagine specific reforms to get rid of those sorts of journals that are profiting through the equivalent of vanity publishing.

Even in serious journals, however, and even where referees do try their best, the scrutiny of just one or two people provides scant security. The mere fact that a paper has been peer reviewed is no guarantee of its quality or, indeed, its reliability.

The problem is nicely illustrated by a paper that appeared in Science at the end of 2015, in which a team of no fewer than 270 authors and co-authors attempted to replicate the results of 100 other experiments that they had published in leading psychology journals. The involvement of the original authors should have made it easier to reproduce the results. Only 36 per cent of the attempted replications led to results that were sizeable enough that one could be confident they had not arisen by chance. In other words, almost two-thirds of the attempts to replicate published, peer-reviewed results of papers in the top psychology journals failed completely.

The veneration of peer review has simply gone too far. The connected concept of “evidence based” has permeated policy discourse, and is sometimes used to lock out non-experts. But in psychology at least, as we have seen, there are papers whose findings could not be replicated that could have been flourished as part of an evidence base in support of one policy stance or another. The evidence is not “based” on any firm foundations; it rests on sand.

So conventional review is flawed; but fortunately, there are alternatives—some of them already in use. One other test of academic papers is by their ability to make successful predictions. This is not infallible. Someone may strike lucky and carry out the scientific equivalent of successfully calling heads 10 times in a row. But consider, say, coronary heart disease. A tiny handful of the thousands of papers on the topic published each year may eventually lead to the development of drugs that successfully pass all the stringent tests set out by the authorities and be licensed for use. To get that far, their insights about what makes the condition better or worse has to be borne out in clinical testing in the case histories of real patients. They do real good, and we can be confident they have some validity.

"Experts need to show some humility: they can't diagnose and prescribe for all of society's ills"

Another recent alternative is to open up the peer review process, so that it actively invites challenge, by letting scientific merit be determined by the esteem of the peer group as a whole, not just by two or three selected referees. One example is the physics e-print archive arXiv.org (pronounced “archive”). Authors can post their papers here prior to publication in a journal if they like, though some feel no need. The site has grown to embrace not just physics but maths and computer science, and, in a small way, quantitative finance.

To post a paper, an author must merely be endorsed by someone who has already published on arXiv. Moderators refuse papers which are obviously not science at all. But scientific importance emerges from the process of downloading and citation. So peer review really is carried out collectively by the relevant scientific community. The more downloads and citations, the chances of an error going undetected become very low indeed. The context is different, of course, but there is an echo here of the logic with which Google has conquered the world.

It is, however, only in the harder sciences where there has been a serious embrace of something approaching the marketplace—of the consumers, other academics in the field, deciding on the worth of a paper. In most disciplines the only model remains a monopoly supplier—the prestigious journal and its editorial board.

But it is in the social sciences that the suppression of challenge can have most political effect. A paper may be brandished purporting to show that all family structures are of equal merit, or that mass immigration does not reduce real wages, perhaps conflicting with religious convictions, personal experience or vernacular conceptions of how society functions. Whatever one’s views, the impression created that expert findings on such contentious political issues are immutable fact is bound to breed cynicism and “expert fatigue.”

An over-emphasis on expert opinion has already had insidious effects on democracy. One of these is a view among some in the intelligentsia, as described in Tom Clark’s Prospect piece (“Voting Out,” February), that the fundamental purpose of democracy is optimal, rational decision-making; if the electorate cannot manage this they—and by implication the democratic system—are at fault.

There are two obvious problems with this. Firstly, in order to rationally optimise society, someone would need to decide what the objectives are. And that is clearly a matter of political opinion. Secondly, it is flagrant mission creep. Democracy is, first and foremost, a mechanism for managing disagreement in society without bloodshed, chaos or repression. To boot, it allows the people to peacefully throw out those in power if they’re doing a bad job.

Expert analysis is of limited use in these tasks. Its recommendations cannot capture what the polymath Michael Polanyi called “tacit knowledge”—knowledge that is based on experience, which shapes people’s habits and beliefs without being codified. This doesn’t get a look-in.

Indeed, in modern social science it is very often only that which gets counted that is deemed to count. And who decides on that? It is, overwhelmingly, the “experts” who get to write the surveys that feed so much social science its raw material. If, for example, they are more interested in what someone’s ethnicity does to their views than they are in whether the respondent lives in the countryside rather than the city, then that is what scarce slots in the questionnaire will be used to find out. Through such means, priors and prejudices about what merits counting can colour the data, even before it has been crunched.

Democracy is a very crude system for giving decision-makers feedback about the quality of our lives. But this most basic process of consultation can never be replaced by data. For quantitative metrics are often very “lossy”—some things are not counted, and thus cease to count. Where experts imagine they can settle a fundamentally political argument through such empirical evidence, the consequences can fast become absurd.

In the UK, the Office for National Statistics has, encouraged by David Cameron in his early tieless phase, measured “well-being” and “happiness,” to guide public policy. This sort of data conflates a very great number of causal factors, which dilutes its value in guiding public policy decisions. And yet one commentator even suggested that, because well-being data showed high levels of contentment in Britain, the vote for Brexit need not have happened.

More generally, the result of putting empirical analysis on a pedestal can be intolerance towards others who start with different views. That was in evidence in some of last year’s sneering at “Leave” voters as dupes who couldn’t understand the arguments. Furthermore, if evidence is everything, but many don’t have the training to process it properly, then unscrupulous characters will spot a chance to make up the odd little, self-serving “fact” of their own—after all, only a minority will know the difference. The rise of empiricism in a world where we are bombarded with information might thus have actually contributed to the post-truth phenomenon.

Why did experts become so prominent in the decades before the crash? The narrowing of disagreement in politics after the end of the Cold War was surely important, as was the associated rise in managerialism. For example, central banks, which make hugely political decisions that shape the relative fortunes of borrowers and savers, were suddenly held to be above politics, and given independence. Huge faith was vested in their predictions, and those of associated technocrats at institutions like the International Monetary Fund, until the crash showed these could not be always depended on.

In the more austere times that have followed, the fundamental conflicts over resources and priorities—between natives and foreigners, between social classes—that probably never truly went away, are now back with a vengeance. The experts certainly have misgivings about all forms of populism and especially about Jeremy Corbyn’s Labour Party, with its cavalier assumptions about how much revenue it could easily raise.

But the backlash against “experts” is, nonetheless, still principally associated with the right. The more educated, liberal-leaning section of society needs to understand why this is. It is not because, as is commonly assumed, the right is simply the political wing of the dark side.

The right’s great insight is that the left can create a political apparatus with good intentions but the wrong incentives, and that this apparatus can become impervious to challenge. It argues that political choice is based on economic self-interest, and that this can apply, perhaps unconsciously, even to people apparently motivated by the public interest. These suspicions, articulated as “public choice theory” by the Nobel Prize winner James Buchanan, have most often been applied to bureaucracies with noble theoretical aims that go awry in practice, but the same analysis can be extended to universities and research institutes too—or indeed “the evidence base.” The Buchanan analysis can easily morph into an intransigent view that pursuing practically any collective goal will lead to empire-building bureaucracies, which also fall prey to “capture” by self-serving lobbyists. Taken to extremes, it promotes a profoundly destructive, atomistic worldview that leaves society paralysed in the face of the most serious moral questions. One only has to look across the Atlantic at the way the American right is responding to climate change and healthcare to see that.

Those who reasonably resist this worldview can counter it in two ways: either through bitter “with us or against us” polarisation, or by having the foresight to avoid the charges that public choice theory would lay at the academy’s door in the first place. That means at least examining the possibility that policies that come blessed with an expert stamp are serving the interests of those who put them forward, rather than dismissing it out of hand.

Truth and evidence must obviously be upheld. But there is a real danger in expert elites studying the electorate at arm’s length and seeking a kind of proxy influence without having to worry about gaining political support. We must not denigrate evidence-based thinking, a bad habit of thuggish regimes, but we must subject it to more “sense-checking,” and in communicating it must pause and give thought to what the broader public will make of it. The alternative is a dialogue of the deaf between the know-all minority and a general populace which some may caricature as know-nothings. In such a stand-off, real evidence soon becomes devoid of all currency.

To avert it, the experts need to show some humility: we can’t diagnose and prescribe for all of society’s ills. We also need to recognise that to be persuasive we must actually persuade—and not simply hector. The great mass of voters are not, after all, under any obligation to accept expert authority. We need to reflect critically on the problems in academia that can block the testing of ideas on the inside, and dismiss all challenge from outside our walls. And we need to show self-awareness: deep intimacy with a subject can, on occasion, lapse into a tunnel vision that blanks out culturally-rooted perceptions and the lived experience of voters. Those things can’t be ignored. They are, after all, the lifeblood and raison d’être of politics, and can only be gauged by asking people, unschooled as well as schooled, for their opinions, and ultimately relying on their decisions.