Near-Term Priorities for Strengthening Trust
Strengthening trust requires innovation along multiple fronts and over a longer time horizon. Despite the long-term nature of the evolution, participants in the global dialogue stressed the need to focus on near-term pragmatic actions to ensure progress.
The value of taxonomies
Before trust in the use of personal data can be strengthened in a meaningful way, a more efficient dialogue is needed. The current dialogue has been shaped by vague and imprecise terms which generally precede the word “data”. Big, small, open, personal – all of these qualifiers are used to fence off a portion of the conversation regarding the use of data. A more holistic approach is needed. What are concrete objectives for regulators? What new types of thinking and insights are needed to be a responsible and accountable organization in the era of big data?
Shared taxonomies on how data originates is an area that can drive meaningful progress in the near term.16 As Dr Linnet Taylor writes: “A new taxonomy of data is badly needed. Industry, government and citizens are too frequently in disagreement as to what exactly constitutes personal data and what does not – and without an understanding of how data gets positioned in each category, or flows between them, it is impossible to have a discussion about how to govern and regulate those flows.”17 Many existing privacy regulatory frameworks do not take this into account. The effect is that they indiscriminately apply the same rules to different types of data, resulting in an inefficient and less than trustworthy ecosystem.
While the call for taxonomies is not new, the growing public concern on the issue of privacy increases the need for meaningful dialogue. Adoption of a common taxonomy can serve to align on shared understandings on the unique differences in the data being generated in today’s world – both on the quantitative change in the amount of personal data being created as well as the qualitative differences based upon the origin of the different data types.
An increasing (and accelerating) proportion of personal data is either passively observed about individuals or computationally inferred about them. By 2020, an estimated 50 billion devices will be wirelessly connected to the internet. 18 At the same time, from 2012 to 2017, machine-to-machine traffic will grow an estimated 24 times to 6 × 1017 bytes per month, a compound annual growth rate of 89%.19 The majority of data will be collected passively through machine-to-machine transactions. Although still projected to grow rapidly, the overall proportion of data actively generated by individuals will decline.
Because of this change, the guidelines and protection mechanisms for governing the use of personal data need to adapt. Legacy privacy guidelines and data protection mechanisms were based on a presumption that data is actively collected from the individual with some level of direct awareness.20 As billions of sensors come online that passively collect data (without individuals being aware of it) and as computer analytics generate and synthesize more “bits about bits”, understanding how data is generated and how engaged the individual is in its creation has become essential for balance and effective governance.
Figure 10: A taxonomy of personal data by origin
Source: Information Accountability Foundation, World Economic Forum, Marc E. Davis
All too frequently, data is grouped into “types” based on whether or not its handling is the subject of a law or regulatory framework. This is particularly true in the US, where valid distinctions can be made at the legal level between financial data, health data and educational data based on whether or not the collection, use, disposal, etc. of the data is subject to Gramm-Leach-Bliley, HIPAA and FERPA, respectively. These “types” are not, however, based on generic categories, and so lead analytical “stovepipes” that tend to defeat interoperable structures.
A framework which can foster a more structured dialogue (and supported by the World Economic Forum’s community of leaders for years) is based upon three categories of data:21
1. Individually provided data
Data can be either “volunteered” by individuals when they explicitly share information about themselves through electronic media, for example, when someone creates a social network profile or enters credit card information for online purchases. Additionally, individuals may also be “compelled” to share data either through governments or commercial entities. The individual is generally aware of the action he or she is taking, and in many instances it has a transactional nature to it (filling out forms, providing medical history, providing an ID and password to install an app). When the volunteered data is more “by me” than “about me”, it typically involves a deeper sense of unique ownership. These personal expressions (such as photos, videos, blog posts, tweets and emails) hold a unique set of claims held by individuals and often have strong emotional ties. Although this is the model assumed by existing data protection regulations, going forward this category will have the least amount of data.
2. Observed data
“Observed” data is captured by recording activities of individuals and can be grouped along a continuum of how aware individuals are of its capture and use. Some observed data is actively generated with a general awareness of the individual (browser cookies, credit card transactions, security cameras, location data from mobile device, etc.). Other forms of observational data are more passive and unexpected (RFID chips on automobiles, facial recognition technologies, WiFi scanners at retail establishments, etc.). In general, there is a lack of awareness by individuals regarding how much observed data is being captured about them, how it is being used and the value that can be extracted in selling (and reselling) it. The rise of mediated information systems (particularly mobile phone applications which have access to address books and location data) have made it much easier to observe an array of behaviours and actions. With passively collected data, the sense of ownership and control tends to shift to the institution which originally captured it. The majority of data generated in an “Internet of Things” world will be observed data – driven by sensors that automatically collect as people go about their day.
3. Inferred data
Advanced computational analytics and machine learning create a third category of data that is “inferred” and synthesized from an array of different data types (including data directly related to individuals and data that is not connected to them). Inferred data is generally more of an amalgam of different originating data types and is generally used for predictive purposes. A higher degree of capital investment and utilization of intellectual property (often proprietary) is applied in generating inferred data. Along with being even further away from the individual in terms of awareness, there is also a loss of control by individuals on how it is used. With inferred data, claims of personal data being “a new asset class” are the strongest. Institutions assert much stronger claims over the inferred data they possess about individuals on the basis that they invested the time, energy and resources in creating it. Additionally, because of the unique, detailed and powerful insights inferred data can provide at multiple scales (individuals, communities and societies), there are competing tensions on how inferred data can be used, which level of impact takes priority, and who gets to determine whether those uses were fair and done with consent. This class of data has the greatest potential to drive innovation and economic growth.
From a policy perspective, the growing proportion of observed and inferred data raises the need for approaches that address concerns when data originates at a distance from the immediate perception of individuals and where consent, participation and awareness are seldom feasible. Additionally, given the fluid and recursive nature of data flows, guidance on upholding the principles of purpose specification and use limitations requires approaches which are much more suited to the increased volume, variety and velocity of how data moves.22
While a taxonomy on the various types of data provides a functional meta-description and a high level tool for guiding policies for acceptable data uses, it does not address the huge variety of contexts for how data is utilised nor the contextual attributes that foster trust (i.e. purpose, risk, value exchange, honesty and transparency and control).23 Plus, given how extensively various types of data are mashed up and iterated upon in today’s environment, it can become an exercise in false precision trying to identify which specific types of data were uniquely responsible for delivering specific insights and outcomes. The algorithms are too complex and constantly changing.
In that light, there is a growing call for a structured vocabulary on the different classes of data uses. At their core, usage taxonomies are focused on understanding the ways that data is used within a particular context. Uses need to be defined somewhat generically in order to accommodate the evolution of new technologies. As such, usage types need to have sector-specific definitions. Research, for example, is a very different use in a health-related context than in marketing.
A particular usage could contain a set of permissions on who would be authorized for certain uses as well as policies that determine the appropriateness of that use. These use policies would reflect a number of factors, including preferences stated by the individual and would be based on the capacities of organizations to comply with internal policies, codes of conduct, as well as jurisdictional and sectoral regulations. Understanding various data types and uses, and their relationships as they interact, are essential components of introducing concepts of contextual understanding into personal data governance.
Underlying this approach will need to be a series of innovations in the area of identity management. In particular, there will need to be approaches which can connect legally recognized online identities with individual people as well as the multiple personas they adopt in their daily lives. There is also a growing recognition that identity per se is not the issue (coming to a widespread agreement on what that means is just too difficult). Rather, it is the flow of relevant reputational attributes about an individual that can strengthen the trusted flow of goods and services.24 Many feel that the rapid growth in the area of collaborative consumption (the sharing of cars, apartments, skills, etc.) has been fueled by the rise of reputation currencies which allow strangers to connect in a trusted and contextually relevant manner.25
Focusing on impact, severity and likelihood
With nearly universal agreement that privacy is critically important yet elusive to uphold, the need for greater clarity on the underlying regulatory objectives and the specific ways to uphold it in the real world is increasing. What precise impacts in the use of data should be prioritized and acted upon? If online privacy is just as important as human rights, how can it be made easier for non-experts to uphold?26 What are the ways to resolve the trade-offs when there are competing interests at the individual, community and societal level?
A growing community of policy, private sector and civil society actors are looking to the discipline of risk management to provide insight into these questions.27 With greater understanding and measurement of the risks and benefits in how data is used, it can serve as a near-term way for creating value, strengthening global interoperability and for existing data protection regimes to incrementally evolve. Calibrated, risk-based approaches can strengthen the ability to establish concrete policy objectives and establish pragmatic approaches for data holders to uphold those objectives in an adaptive, ethical and resource-efficient manner.28
The central idea is to expand the analysis of privacy through the eyes of the individual.29 By extending the lens of analysis to a first person perspective, there is an opportunity for institutions to better identify, classify and assess privacy risks in terms of likelihood and seriousness of impact. An emphasis on what outcomes can be achieved can supplement the questions of how to be privacy compliant.
A starting point begins with asking: What is the intended impact of using data? How severe is that impact? How likely is it to occur? Who holds the risk? In pursuing this analysis it is also important to differentiate between threats in the stewardship of data and the associated benefits or harms they could create. This provides a way to organize threats (i.e. security breaches, loss of confidentiality, inappropriate usage or inappropriate access) and classes of harms. Some harms are tangible (loss of life, freedom of movement, property theft and physical injury) and some are intangible (such as restrictions on personal expression, social anxieties, emotional distress and reputational damages). The scale of the potential impact and who holds the risk also need to be addressed. Is the anticipated impact intended for a particular individual, a community or is it societal?
Figure 12: Framework for assessing benefits and harms of data processing
Source: Centre for Information Policy Leadership, World Economic Forum
Additionally, a qualitative assessment of the different threats can be useful. Perceived (but unlikely) threats can result in a disproportionate amount of attention being paid to their prevention relative to their likelihood of actually occurring. Perceived threats can often lead to regulations based on “assume the worst” outcomes. The effect of “dread control” can distort the focus of regulatory efforts because of disproportionate fears on trying to control unlikely but dreaded events.30
The differences in how the perception of harms can internally vary from one individual to another add yet another layer of complexity. The risks and benefits of using data for one individual simply may not apply to others. Demographics, cultural norms, socioeconomic status, geography, politics and psychological profile are just some of the factors shaping the nature of perceived harms of data use.31 More research is needed to identify some of these human-centred complexities. Along with a greater understanding of the impact, severity and likelihood of a given use of personal data, the ability to measure these elements in a consistent and reliable manner is a critical enabler for strengthening trust. With commonly shared and agreed upon metrics of impacts, the discipline of risk management can be applied to address privacy concerns. Risk management can be applied across the data value chain to more granularly assess systemic reliability, codes of conduct and legal compliance.32 Valuation and risk calculation can be established. Additionally, normative cross referencing of existing regulatory statutes can occur across jurisdictional boundaries. Measurements enable reliability and trust.
The lens of risk management should not be viewed as a replacement to existing policy frameworks and regulations. Rather, it can serve as an adaptive and more granular means to move past the vague notions of “creep”, which currently guide much of the decision-making on personal data usage. The call to more broadly adopt the discipline of risk management is an ascendant theme within the privacy community. A shift to assessing the potential harms and benefits “is more intuitive, better reflects the importance of context, is more consistent with broader consumer protection law and, most importantly, it shifts the burden of protecting personal data away from individuals to the data handlers”.33 Notice and consent practices could be developed that were easier to understand for individuals and which could grow in line with technological innovation to the benefit of all stakeholders.
Risk-based approaches to privacy: Factors to consider
- Prioritisation based on the seriousness and likelihood of harm and impact to individuals
- Improves clarity on what it means for stakeholders to be accountable
- Clarifies regulatory uncertainty
- Addresses the emerging technology challenges of data-driven economies
- Strengthens global interoperability
Center for Information Policy Leadership Project, 2014