The Evolving Data Landscape
By having granular data that captures the experiences of poor communities, along with the analytical techniques needed to decipher that data, researchers and development practitioners can improve the accuracy, effectiveness and reach of their initiatives. Practitioners in the field of economic and social development can better monitor and track the progress of their programmes in almost real time, bring projects to scale at a lower cost, gather rapid feedback from the field, collaborate more effectively with stakeholders, and demonstrate impactful outcomes.
Jake Kendall, Bill and Melinda Gates Foundation
It goes without saying that a unique technology-fuelled global transformation is underway. The worldwide increase in digital connectivity, the global scale of highly personalized communications services, and advances in data analytics have coalesced to create a powerful platform for change. In this networked world, people, objects and connections are producing data at unprecedented rates, both actively and passively.
“Big Data” is the engine of this growth. A concept central to the Data Revolution, it is a term with multiple and varied definitions. For the purposes of this report, Big Data will be defined by the so-called “Four Vs”: volume (massive and passively generated); variety (originating from both individuals and institutions at multiple points in the data value chain); velocity (generally operating in real time); and veracity (referring to the uncertainty due to bias, noise or abnormality in data).
Against this backdrop, the ways in which data can be leveraged to positively impact the lives of the most vulnerable are just beginning to emerge. Because of its detail, timeliness, ability to be utilized for multiple purposes at scale and to make large portions of low-income populations visible, the potential for data-driven development is unprecedented.
At the same time, it is a domain that provides keen insights into people’s lives, behaviours, health, prosperity, needs and aspirations.2 To address these concerns, it is critical to clearly understand the context and nature of the local development challenges that individuals face before initiating data-driven interventions.
At present, three general criteria determine the appropriateness of using data: ethics (the underlying principles for using data); accountability (how effectively the principles are implemented and enforced); and veracity (the accuracy and completeness of the underlying data sets). The absence of broadly shared processes, paradigms and measurements that can help dissipate these tensions is an area requiring much additional work. Easy answers do not work because they simply mask the deeper complexity of interrelated challenges which will need to be continuously managed and rebalanced.
An example of these complex challenges can be seen in the 2014 Ebola crisis in West Africa. Despite months of talks between health officials, UN agencies, mobile network operators and governments, getting access to mobile network operators’ data on population movement was problematic.
A number of factors created the entanglement: commercial interests (brand reputation risks, fear of having operating licenses revoked and disclosure of proprietary information); ethical concerns (privacy); national security concerns (releasing population movement details to third parties); regulatory uncertainty (vague legal liabilities); and knowledge and leadership gaps (lack of organizational prioritization) were just some of factors contributing to the stalemate. As stated in The Economist magazine, “Because there was no precedent for using call detail records in an emergency like Ebola, it was hard to bring the parties together at a high-enough political or management level to make decisions.”3 There was no meta-institutional narrative of data sharing habits to help bring stakeholders together.
The consequence of this uncertainty is that the global dialogue on data for development is polarized. The optimists are advancing somewhat utopian views of the vast potential of using data for the common good. Advocates (often supported by well-funded public relations campaigns positioning technology executives as leaders) argue that with meaningful controls in place, a whole new range of digital insights can be applied to help track the outbreak of infectious diseases, strengthen resilience following natural disasters, enhance access to financial services for the poor and understand migration patterns of vulnerable populations.
Likewise, the pessimists with equally strong voices are pointing to dystopic futures dominated by “digital extractive industries”, which leverage incumbent power asymmetries that are enabled by governments and industry alike. Headlines over how data have been used for private sector and government surveillance, identity theft, discrimination against minorities and a host of other harms have made this a non-academic debate. Underlying this view is the notion that the trust, transparency and control that individuals have regarding the use of data about them is significantly constrained and will need to be addressed for an ecosystem that is sustainable over the long term.
Big Data vs Smart Questions
One perspective on the public/private bureaucracy preventing the sharing of population movement data to stop Ebola is that the framework for delineating the types of analysis needed and the appropriate safeguards to prevent data abuse are confused.
- Known Data and Known Question (lower left):
This quadrant is for optimizing data for standardized processes and procedures. The questions are known and so are the data sources. Data quality, accuracy and timeliness are critical. Many of the issues for strengthening the capacities of national statistics offices fall into this quadrant. The challenges are “known knowns” and operational in nature.
- Known Data and Unknown Question (lower right): This quadrant is for domain experts to discover questions “they didn’t know to ask.” In this area data sets are known but their combined value to discover new correlations is unique. The key outcome is for experts to discover new knowledge to build sophisticated data models.
- Known Question and Unknown Data (upper left): This quadrant is about providing existing data models with access to specific data resources. In this quadrant, the data doesn’t need to move or be pooled. Innovative data models just need access to tightly controlled data sets. Much of the confusion and inertia within the Data Revolution is occurring in this quadrant. Combining large data sets for discovering new insights isn’t needed (quadrant 2). Rather, data models just need access to data that can then be turned into actionable information (quadrant 3).
- Unknown Question and Unknown Data (upper right): Quadrant 4 is about “unknown unknowns” where machine learning, massive/passive data sets and real time, personalized feedback loops come into play. This is where explorative, predictive and sentient computing comes into play that can account for the dynamic complexity of the world and stay ahead of human decision-making which is often slow and uninformed. These can feed into new types of data-driven decision support tools.