Strengthening Data Governance
As noted in previous World Economic Forum reports, the global anxiety over the governance of data stems from the fact that everyone is somewhat in the dark. According to author David Brin: “We’re in a fog of data ignorance.” To unleash the potential of data for development, a number of governance issues must be addressed.
While the data economy of emerging economies is just beginning to form, it seems safe to say that it will most likely resemble the ongoing evolution seen in developed economies in terms of its structure. Global-scale standards for data technology infrastructures will force consideration of matching global-scale policy and economic data standards for at least the subset of shared data-use concerns. Moving fluidly between jurisdictions, organizations and functions, data flows will be constantly shifting, with firms entering and exiting, new analytical tools coming into use and value chains becoming increasingly complex. The growth of passive data collected from billions of sensors will add further scale and complicate efforts to manage, monitor and audit data flows.
Currently, a set of interrelated questions is evolving over the legal requirements, policies, ethics and norms that guide the use of data in both developed and developing economies. Many of the existing approaches that guide the creation, collection, storage and use of data were based upon decades-old policies of developed economies first established in the era of mainframe (un-networked) computing. While many of the underlying Fair Information Practice Principles that currently guide stakeholders in various jurisdictions and sectors are still relevant and important, some need to be updated and refreshed to address the new challenges of networked systems and also to suit the unique needs of emerging economies (such as engagement of the individual, use limitations and purpose specification).10
Additionally, new governance systems and institutions must earn and maintain the trust of individuals and organizations alike regarding their data use and handling practices. Power dynamics exacerbate lack of trust that results from a lack of transparency in current data approaches. A steady stream of media reports worldwide reminds consumers that data often flow in ways that can be intrusive on individual rights either because they are outside the traditional rule of law or are under the broad umbrella of state national security interests.
In every region of the world, strengthening meaningful transparency in the ways that data are collected, stored and used has been widely recognized as a shared global priority.11a It has also been noted that transparency is in many ways a paradox. Greater transparency, without controls and education, can overwhelm individuals with too much information.
Additionally, the construct of transparency is generally oriented towards strengthening externally-facing “front door” relationships with individuals. When it comes to the “back door” ways that data flow from these entities through their suppliers within the “data-industrial complex”, the transparency and incentives for sharing are diminished or eliminated. Much greater visibility and auditability of both the public and private data supply chains are needed to avoid “transparency-washing.”
It is also important to anticipate that the proportion of personal data that is either passively observed about individuals or computationally inferred about them is growing at an ever-increasing rate. By 2020, an estimated 50 billion devices will be wirelessly connected to the internet. Because of this global change, the guidelines and protection mechanisms for governing the use of high-frequency and high-resolution data in both the Global South and North need to adapt.
Legacy privacy guidelines and data protection mechanisms currently in effect were based on an earlier presumption that data are actively collected from the individual with some level of their direct awareness. As billions of sensors come online and passively collect data (without individuals’ awareness), and as computer analytics generate and synthesize more “bits about bits” (or “meta-data”), understanding how data are generated and how engaged the individual is in their creation and collection will be essential to balance interests for effective data governance. As African Studies scholar Laura Mann notes: “In Kenya, for example, the government has awarded the telecommunications company Safaricom a lucrative security and anti-terrorism contract while Kenya Revenue Authority has begun to mine mobile transaction data to identify noncompliant taxpayers. While the sale, sharing, or indeed interception of digital data may improve states’ developmental capacities and lead to more targeted social policy, it also raises important ethical implications about privacy and the political manipulation of data by powerful groups.”
Balancing the trade-offs between the public good that can be achieved with data and the potential harm to individuals and communities is central for effective data governance.
Trust is an important variable in evaluating such trade-offs, such as in cases where the degree to which data have been anonymized before transfer is balanced with the trust placed in the recipient and their processes to avoid unauthorized access. When the issue of anonymity is discussed, it is generally rare to also hear the level of trust in the recipient referenced in discussion. The recognition of trusted third parties and systems to manage anonymized datasets, enable detailed audits and control the use of data could enable greater sharing of data among multiple parties while serving to manage and mitigate identified risks.11b While much more research is needed in computational privacy, the widespread adoption of existing techniques could enable this trend of sharing data in a privacy-conscious way.
Overcoming these challenges will require a comprehensive revision of policy frameworks that were based upon legacy information flows within hierarchical, industrialized institutions relying on centralized information distribution systems in which data and their applications were defined and limited. The internet and its global data flows are fully distributed, challenging traditional institutional and sovereign borders.
The pervasiveness of hierarchical institutions raises a question regarding the institutional appetite for a genuinely transformative “data revolution”. Reliance by stakeholders on existing hierarchical institutions is understandable but it is not clear that the mere combination of existing public and private institutions (with their centralized power structures) will capture the benefits and have aligned incentive structures for change.
Current problems, like highly centralized institutions, are artefacts of current power structures. New levels of thinking about data governance will reveal new potential governance structures. The super-structure of shared information systems among institutions compels this analysis.
Another issue shaping the governance of data is the lack of a shared taxonomy of impacts (both benefits and harms). Shared taxonomies can drive meaningful near-term progress. As Linnet Taylor of the University of Amsterdam writes on the issue of data taxonomies: “A new taxonomy of data is badly needed. Industry, government and citizens are too frequently in disagreement as to what exactly constitutes personal data and what does not – and without an understanding of how data gets positioned in each category, or flows between them, it is impossible to have a discussion about how to govern and regulate those flows.”12 Many existing privacy regulatory frameworks do not take this into account. The effect is that they indiscriminately apply the same rules to different types of data, resulting in an inefficient and less-than-trustworthy ecosystem fraught with unintended consequences that undermine reliability and predictability.
Local context is another critical governance issue. Ulrich Mans at the Peace Informatics Lab of Leiden University (Campus The Hague) comments, “We need to create and make visible a growing number of data-driven initiatives across developing economies that have a clear benefit for those living in extreme poverty.” To do this, taking account of the local context is key. Attitudes and tolerance for how data are used and what is legitimate, fair or ethical vary greatly among different geographic and social groups. While incorporating context-related nuances into regulation is difficult, it is clear that universal data use policies that treat all data equally will face significant challenges to remaining relevant in all contexts and over time.13
Emerging Principles of the Data Revolution
The UN Secretary General’s Data Revolution Independent Experts’ Advisory Group has advanced 10 principles. A preliminary digest is provided below.
Data quality and integrity
Poor quality data can mislead.
To the extent possible and with due safeguards for individual privacy and data quality, disaggregated data can provide a better comparative picture of what works and help inform and promote evidence-based policy-making.
Data delayed is data denied. The data cycle must match the decision cycle.
Publicly-funded datasets, as well as data on public spending, should be available to other public ministries or the general public. Underlying data design and sampling, methods, tools and datasets should be explained and published alongside findings to enable greater scrutiny, understanding and independent analysis.
Data should be made public in ways that encourage greater use and be complete, machine-readable, freely available for reuse without restrictions, and transparent about underlying assumptions.
Data usability and curation
Data architecture should place great emphasis on user-centered design and user-friendly interfaces. Communities should be fostered to develop new tools that can translate raw data into something meaningful to a broader constituency of non-technical potential users.
Data protection and privacy
Clear international norms and robust national policy and legal frameworks must be developed.
Data governance and independence
Data quality and NSOs should be protected and improved, to ensure they are functionally autonomous, and independent of political influence.
Data resources and capacity
National statistical systems should be established that are capable of producing high quality statistics in line with global standards and expectations.
Rights include (but are not limited to) the right to be counted, the right to an identity, the right to privacy and shared control, the right to due process, the right to freedom of expression, the right to participation, the right to non-discrimination and equality, and the right to principles of consent.