Data treatment and score computation
Share
This section details the process whereby individual responses are edited and aggregated in order to produce the scores of each economy on each individual question of the Survey. These results, together with other indicators obtained from other sources, feed into the GCI and other research projects.4
Data editing
Prior to aggregation, the respondent-level data are subjected to a careful editing process. A first series of tests is run to identify and exclude those surveys whose patterns of answers demonstrate a lack of sufficient focus on the part of the respondents. Surveys with at least 80 percent of the same answers are excluded. Surveys with a completion rate inferior to 50 percent are excluded.5 The very few cases of duplicate surveys—which can occur, for example, when a completed survey is both completed online and mailed in—are also excluded in this phase.
In a second step, a multivariate test is applied to the data using the Mahalanobis distance method. This test estimates the probability that an individual survey in a specific country “belongs” to the sample of that country by comparing the pattern of answers of that survey against the average pattern of answers in the country sample.
More specifically, the Mahalonobis distance test estimates the likelihood that one particular point of N dimensions belongs to a set of such points. One single survey made up of N answers can be viewed as the point of N dimensions, while a particular country sample c is the set of points. The Mahalanobis distance is used to compute the probability that any individual survey i does not belong to the sample c. If the probability is high enough—we use 99.9 percent as the threshold—we conclude that the survey is a clear outlier and does not “belong” to the sample. The implementation of this test requires that the number of responses in a country be greater than the number of answers, N, used in the test. The test uses 50 core questions, selected by their relevance and placement in the Survey instrument.
A univariate outlier test is then applied at the country level for each question of each survey. We use the standardized score—or “z-score”—method, which indicates by how many standard deviations any one individual answer deviates from the mean of the country sample. Individual answers with a standardized score zi,q,c greater than 3 are dropped.
Aggregation and computation of country averages
We use a simple average to compute scores of all countries.6 Therefore, every individual response carries the same implicit weight.
Formally, the country average of a Survey indicator i for country c, denoted qi,c , is computed as follows:
where
qi,c,j is the answer to question i in country c from respondent j; and
Ni,c is the number of respondents to question i in country c.
Moving average and computation of country scores
As a final step, the country averages for 2015 are combined with the 2014 averages to produce the country scores that are used for the computation of the GCI 2015–2016 and for other projects.
This moving average technique, introduced in 2007, consists of taking a weighted average of the most recent year’s Survey results together with a discounted average of the previous year. There are several reasons for doing this. First, it makes results less sensitive to the specific point in time when the Survey is administered. Second, it increases the amount of available information by providing a larger sample size. Additionally, because the Survey is carried out during the first quarter of the year, the average of the responses in the first quarter of 2014 and first quarter of 2015 better aligns the Survey data with many of the data indicators from sources other than the Survey, which are often year-average data.
To calculate the moving average, we use a weighting scheme composed of two overlapping elements. On one hand, we want to give each response an equal weight and, therefore, place more weight on the year with the larger sample size. At the same time, we would like to give more weight to the most recent responses because they contain more updated information. That is, we also “discount the past.” Table 2 reports the exact weights used in the computation of the scores of each country, while Box 3 details the methodology and provides a clarifying example.
Trend analysis and exceptions
The two tests described above address variability issues among individual responses in a country. Yet they were not designed to track the evolution of country scores across time. We therefore carry out an analysis to assess the reliability and consistency of the Survey data over time. As part of this analysis, we run an inter-quartile range test, or IQR test, to identify large swings—positive and negative—in the country scores. More specifically, for each country we compute the year-on-year difference, d, in the average score of a core set of 66 Survey questions. We then compute the inter-quartile range (i.e., the difference between the 25th percentile and the 75th percentile), denoted IQR, of the sample of 140 economies. Any value d lying outside the range bounded by the 25th percentile minus 1.5 times IQR and the 75th percentile plus 1.5 times IQR is identified as a potential outlier. Formally, we have:
where
Q1 and Q3 correspond to the 25th and 75th
percentiles of the sample, respectively, and
IQR is the difference between these two
values.
This test allows for the identification of countries, which display overly large upward or downward swings or repeated and significant changes over several editions. The IQR test is complemented by a series of additional empirical tests, including an analysis of five-year trends and a comparison of changes in the Survey results with changes in other indicators capturing similar concepts. We also conduct interviews of local experts and consider the latest developments in a country in order to assess the plausibility of the Survey results.
Based on the result of this test and additional qualitative analysis, and in light of the developments in these respective countries, it was decided to not use the data collected in Azerbaijan, Burundi, Guinea, the Russian Federation, Seychelles, and the United Arab Emirates. In those cases, we use the results from last year, which were derived from the results of the 2013 and 2014 editions, or the previous year (see the exceptions section in Box 3). Although this remains a remedial measure, we will continue to investigate the situation over the coming months in an effort to better understand the Survey data in these countries. This measure does not imply that the Partner Institutes have not implemented the Survey according the sampling guidelines.
Last year, the same analysis resulted in the Survey data of Rwanda being dismissed. This year, as an intermediate step toward the re-establishment of the standard computation method, we used a weighted average of the Survey data of 2013 and 2015 for Rwanda.