We are in the Dark of Age of Data. It's time to evolve

For those of us who champion the power of data, the past five years have been an incredible ride thanks to the rise of big data. Consider just these three examples: by 2020, we will have created as many digital bits as there are stars in the universe; data drove President Obama’s wins in the 2008 and 2012 elections; and data is powering the incredible rise of new companies like Uber and AirBnB, allowing people to monetize their most illiquid fixed assets like cars and houses.

Of course, data hasn’t accomplished any of this. Data isn’t the protagonist in any of the stories above. Humans are. People use data. Data can show correlations and trends, but people have insights that suggest cause and effect. Insights are what enable better decisions and drive innovation.

And here’s the catch: in spite of our recent data-driven achievements, the evidence suggests that humans may well be in the dark ages of data. McKinsey, in their broadly read Big Data report, estimates that there will be only 2.5 million data-literate professionals in the United States in 2018 — less than 1 percent of the projected population. Surveys show that professionals today still take action the old-fashioned way — based on gut instinct, personal experience and what they think they know.

So, with all this data, technology and promise, how do we build a more data-literate world?

Consumption Requires Context

If we think of data as food for our mind, the nutrition movement might offer some clues. Today the state of labelling data for appropriate use is akin to the opaque labelling of food products over 40 years ago. Until relatively recently, we had no idea whether the food we ate contained inorganic products, genetically modified ingredients, lead or even arsenic. Today we have raised nutritional awareness by listing critical ingredients and encouraging nutritional literacy that can assist in making healthy eating a conscious behaviour.

Consuming data appropriately requires the same type of conscious evaluation of ingredients. Here is a relatively common and simple example:

At one large multinational company, it turned out that the Date of Birth field is generally not populated. Rather, it’s defaulted to Jan. 1, 1980. As a consequence, if you did not know this fact and tried to find the average age of your customers, you would come to the conclusion that your customers are younger than they really are. This mistake happens so often that it has created a myth within the institution that they service young customers when their actual customers are typically middle-aged.

Drawing incorrect conclusions from data often does more damage than not using data at all. Consider the spurious relationship between vaccinations and autism or that six of the 53 landmark cancer studies are not reproducible. An Economist survey revealed that 52 percent of surveyed executives discounted data they didn’t understand, and rightfully so. The Economist reminds us that a key premise of science is “Trust, but Verify.” The corollary also holds true — if we can’t verify, we won’t trust.

Packaging Data

No one wants to consume something that they’re not expecting. If someone expects a red velvet cupcake and you feed them pizza, they might live with it, but the initial experience is going to be jarring. It takes time to adjust. So, what does this have to do with data?

Data doesn’t really speak your language. It speaks the language of the software program that produced the information. You say sales, and the dataset says rev_avg_eur. You say France, and the data set says CTY_CD: 4. Can these labels be learned? Sure, but even in a relatively small organization, there might be 20 software programs in use every day, each of which has hundreds of different codes, attributes and tables. Good luck if you are in a multinational organization with tens of thousands of such programs.

This translation has a larger unseen cost. A recent industry study highlighted that 39 percent of organizations preparing data for analysis spend time “waiting for analysts to assemble information for use”. And another 33 percent spend time “interpreting the information for use by others.” If, every time we need an answer, it takes us hours or days to assemble and interpret the information, we’ll just ask fewer questions — there are only so many hours in a day. Making data easy to consume means ensuring that others can easily discover and comprehend it.

A Data-Literate World

We have an incredible opportunity in front of us. What if just 5 percent of the world’s population were data literate? What if that number reached 30 percent? How many assumptions could we challenge? And what innovations could we develop?

According to Wikipedia, the skills required to be data literate include understanding what data means, drawing correct conclusions from data and recognizing when data is used in misleading or inappropriate ways. These are the decoding skills that enable an individual to apply data analysis accurately to decision-making. Rather than focusing on making data consumers do more work, maybe we can boost literacy by surrounding the data with context and reducing the burden of understanding the information.

Metrics and statistics are wonderful, but we need to surround data with more context and lower the costs of using data. More fundamentally, we have to reward those people and systems that provide this transparency and usability. Data is just information — we need to evolve in how we use it to unlock its potential.

Credit:Satyen Sangani,CEO,Alation

This piece first appeared in the World Economic Forum’s Agenda blog.