Big data and the emperor's new clothes

“London is too full of fogs and … serious people …Whether the fogs produce the serious people, or whether the serious people produce the fogs, I don’t know.”

That was one of Oscar Wilde's bon mots in Lady Windermere's Fan.  It could have been used, and perhaps should have been, in the Observer yesterday in a tremendously important article about big data and its threat to democracy.

Because the real question, as it was for Wilde, is what causes what?

The article was by Evgeny Morozov and I hope everyone reads it.  It isn't so much about the threat from surveillance, important as that is (we should all own our own data, it seems to me), but about the great utilitarian dream that goes along with big data - that somehow all the issues of government can simply be measured, without debate or awkward democracy.  This is what Morozov calls 'alogarithmic regulation'.

The trouble is that he doesn't really go far enough.  He assumes this is somehow a new phenomenon, whereas it actually dates a long way back.

As I wrote in my book The Tyranny of Numbers in 2001, and as Morozov says, all these screeds of data never take you back to causes.  They will never actually allow you to debate them, or to tackle them, just an obsession with symptoms.

It is a New Labour fantasy of handing government over to machines that measure data - but with no understanding of how inaccurate data will always be if it is chained to definitions, or if it leads to controls.

Why does data have such an appeal to modern governments? Partly because of the technocratic thrill of measuring the ebb and flow of symptoms as if government was a gigantic, though not particularly well-oiled machine. Yet cause and effect is the one thing it is quite impossible to measure - interpreting the burgeoning wealth of data to work out what causes what is always a matter of judgement, common sense and intuition.

That's the problem with data.  It won't interpret. It won't inspire and it won't tell you what causes what. Statistics have nothing to do with causation, the pioneering number-cruncher William Farr told Florence Nightingale in 1861: "You complain that your report would be dry. The dryer the better. Statistics should be the dryest of all reading." 

 But over-reliance on numbers sweeps away your intuition along with ideology. It leaves policy-makers staring at screeds of figures, completely flummoxed by them, unable to use their common sense to interpret the babble of competing causes and effects – unable to tell one from the other.

If men with long ring fingers are subject to depression - as they are for some reason - that might alert you to looking for a causal link. The same is true of other peculiar numerical links: high stress makes you much more likely to catch colds, accident rates among children double when their mothers are miserable. 

 These odd connections might surprise and inspire you to think about problems in new ways, but it won't tell you what causes what. You will have to use your intuition to work out where to look in a massively complex world of complex systems. "Scientists try to avoid emotions and intuition," says the biologist Stephan Harding, "but it is exactly those that give them ideas."

Too many numbers also drives out history – it gives us no sense of the different ways in which people measured in the past. It drives out creativity, locking away Keynes’ dark woolly monster of ideas. And it drives out morality too – leaving our poor beleaguered ethics committees desperately trying to measure themselves a coherent attitude to the frightening future of genetically-modified human beings, or whatever takes their place.  Where's your data, we will demand of them?  Where's your evidence?

And to get through the next few perilous decades, to look after each other, and solve the looming problems ahead, we're going to need all the judgement, intuition, history, creativity and morality we can possibly muster. So we have to make absolutely sure our tidal wave of data doesn't drive those things out.

My antidote to the tyranny of data is to ask the question the little boy asks in the Emperor's New Clothes.

Simple questions because they can devastate most political statistics. Yes, the carbon monoxide rate has reduced, but is the air cleaner? Yes, our local university professors have produced a record number of learned published papers, but is their teaching any good? Yes, the exam passes top the league tables, but what about the education? Are the children happy? Can they deal with life?

Data based on definitions is as vulnerable as the Emperor's New Clothes to the incisive, intuitive human question.  So is the utilitarian philosophy behind it.  Go ask...

Subscribe to this blog on email; send me a message with the words blog subscribe to dcboyle@gmail.com. 
When you want to stop, you can email me the word unsubscribe.