Chapter 8 Regulatory Distance

8.1 Calculating the regulatory distance

In this section we will calculate the regulatory distance between different country-pairs. Regulatory distance is an indicator to measure the similarity between NTM policies across member States. This way we can identify regional trade agreements (RTAs) amongst countries. We calculate the regulatory distance between two countries by summing all the combinations where the two countries have the same product-NTM combination and divide by the total number of product-NTM combinations that exist:

\(rd_{ij} = \frac{1}{N} \sum_k \sum_l |n_{ilk} - n_{jlk}|\).

In the formula above \(N\) is the total amount of unique combinations of product codes and NTMs, \(n_{ilk}\) is a dummy variable for country \(i\) applying NTM \(l\) to product \(k\). \(n_{jlk}\) is also a dummy variable, but for country \(j\) applying NTM \(l\) to product \(k\). (For more information on regulatory distance for NTMs, you can check this UNCTAD paper.)

The purpose of this subsection is to create a regulatory distance matrix which will contain all the paired distances between ESCAP countries. To do this, we will complete the following steps:

Step 1: Create the list with ESCAP member countries which have available data.
Step 2: Create an empty regulatory distance matrix, by including ESCAP country ISO3 codes as column and row names.
Step 3: Calculate \(N\) (the total amount of unique combinations of product codes and NTMs).
Step 4: Create a nested for loop to calculate the regulatory distance per country pair.

Let’s first begin by activating the necessary packages we need. There is commented code in case you have not installed them already.

#install.packages("dplyr")
library("dplyr")
#install.packages("readstata13")
library("readstata13")

Next, let’s get the data. In the table below you can find the two data sets we will use. The link to download is in the third column. We will use once again the same NTM data as before In Section 7.

Description File name Link to download
List of all products which have an NTM ntm_hs6_2016 v.12.dta here
List of all ESCAP member States Country Categories.csv here

Let’s now load the list of all products which have an NTM. Keep in mind this might take a few minutes.

ntm_data <- read.dta13("data/ntm_hs6_2016 v.12.dta")
## Warning in read.dta13("data/ntm_hs6_2016 v.12.dta"): 
##   notTL:
##   Missing factor labels - no labels assigned.
##   Set option generate.factors=T to generate labels.

Once the data is loaded, we exclude NTM codes which are missing. We only need the reporter, NTM code and product codes (HS 6-digit codes).

ntm_data <- ntm_data[!is.na(ntm_data$ntmcode)&ntm_data$ntmcode!="",]
ntm_data <- ntm_data[,c("reporter", "ntmcode", "hs6")]

We group the data by reporter, NTM and product code (hs6) and count the number of combinations in a new variable called count.

ntm_data <- ntm_data %>% group_by(reporter, ntmcode, hs6) %>%
  summarise(count = n())
head(ntm_data)
## # A tibble: 6 x 4
## # Groups:   reporter, ntmcode [1]
##   reporter ntmcode hs6    count
##   <chr>    <chr>   <chr>  <int>
## 1 AFG      A110    010110     1
## 2 AFG      A110    010190     1
## 3 AFG      A110    010210     1
## 4 AFG      A110    010290     2
## 5 AFG      A110    010410     1
## 6 AFG      A110    010420     1

We prepare the regulatory matrix by creating a list of countries for which we want the regulatory distance. Remember the regulatory matrix shows the distance between two countries and has as column and row names the ISO3 codes of the countries.

For the moment we will do this exercise only for ESCAP member States as it is quite time consuming. Readers interested in having the analysis for all available countries are recommended to only run avail_iso3s <- unique(ntm_data$reporter).

We load the file with countries ISO3 codes and select only the ESCAP member States.

countries <- read.csv("data/Country Categories.csv", stringsAsFactors = FALSE)

ESCAP <- countries$iso3[countries$escap==1]

Then we select the unique reporter ISO3 country codes from the NTM data. This list represents all the country codes for which we have available NTM data.

avail_iso3s <- unique(ntm_data$reporter)
head(avail_iso3s)
## [1] "AFG" "ARE" "ARG" "ATG" "AUS" "BEN"

We use intersect to find the rows that are both in avail_iso3s and ESCAP. Notice in the final round we have only ESCAP members as expected.

avail_iso3s <- intersect(avail_iso3s,ESCAP)
head(avail_iso3s)
## [1] "AFG" "AUS" "BRN" "CHN" "HKG" "IDN"

We create an empty regulatory distance matrix. For column size we use the length of avail_iso3s and add 1 for the reporter column. We populate the column names with reporter and the ISO3 codes with the option dimnames.

regulatory_distance_matrix <- data.frame(matrix(vector(),0,length(avail_iso3s)+1,
                                   dimnames = list(c(), c("reporter", avail_iso3s )
                                   )),
                            stringsAsFactors=F)

Now we can move on to calculating the regulatory distance formula:

\(rd_{ij} = \frac{1}{N} \sum_k \sum_l |n_{ilk} - n_{jlk}|\).

As \(N\) is a constant, let’s start with calculating it outside of the loop. Remember, \(N\) is the total number of NTM and product codes combinations in the entire NTM data set. Hence we can use group_by. We define this number under N.

N <- ntm_data %>% group_by(ntmcode, hs6) %>% count()
N <- nrow(N)

We are ready to fill in our regulatory distance matrix with values. In order to do so, we loop through each country in ESCAP and fill in the regulatory_distance_matrix created above. Let’s go through the nested loop together.

We first record all the NTMs (ntmcode) and product codes (hs6) for the first reporter country country_a <- ntm_data[ntm_data$reporter==avail_iso3s[g],c("ntmcode", "hs6")]. We create an additional column where we mark all these combinations with 1: country_a$country_a <- 1. We include the ISO3 country code of the current country under the reporter column: regulatory_distance_matrix[g,"reporter"] <- avail_iso3s[g].

We create a second for statement because we want all possible combinations of country A with the remaining countries. In the second loop, we record all the NTMs (ntmcode) and product codes (hs6) for the second country country_b <- ntm_data[ntm_data$reporter==avail_iso3s[k],c("ntmcode", "hs6")]. We create an additional column where we mark all these combinations with 1 country_b$country_b <- 1. We merge both data sets: merged <- merge(country_a, country_b, by=c("ntmcode", "hs6"), all = TRUE) and include all rows for both data sets. If in the joined data, country A has a combination, which country B doesn’t have - we put a 0: merged[is.na(merged)] <- 0. We create a variable to represent the absolute difference between the indicators for NTMs for both countries: merged$abs_diff <- abs(merged$country_a-merged$country_b). We sum the absolute difference between the two countries and divide the sum by N (the total amount of NTM and product code combinations in the NTM data set): rd <- sum(merged$abs_diff)/N. The resulting value is the regulatory distance rd, which we include in our regulatory distance matrix: regulatory_distance_matrix[g,avail_iso3s[k]] <- rd.

To make this code more efficient, we include the if statement which skips calculating the distance for A and B, if the distance between B and A is already calculated: if (!is.na(regulatory_distance_matrix[k,avail_iso3s[g]])){next }. (This way we end up with a upper triangular matrix.)

Once you are ready, run the codes below. Keep in mind that it might take a few minutes.

for (g in 1:length(avail_iso3s)){
  country_a <- ntm_data[ntm_data$reporter==avail_iso3s[g],c("ntmcode", "hs6")]
  country_a$country_a <- 1
  regulatory_distance_matrix[g,"reporter"] <- avail_iso3s[g]

  for (k in 1:length(avail_iso3s)){

     if (!is.na(regulatory_distance_matrix[k,avail_iso3s[g]])){next }

  country_b <- ntm_data[ntm_data$reporter==avail_iso3s[k],c("ntmcode", "hs6")]
  country_b$country_b <- 1
  merged <- merge(country_a, country_b, by=c("ntmcode", "hs6"), all = TRUE)
  merged[is.na(merged)] <- 0
  merged$abs_diff <- abs(merged$country_a-merged$country_b)
  rd <- sum(merged$abs_diff)/N
  regulatory_distance_matrix[g,avail_iso3s[k]] <- rd

   }
}

Now we fill in the missing values and create a CSV file. For instance, if we have a missing value for country A and country B if (is.na(regulatory_distance_matrix[k,avail_iso3s[g]])), we fill it in with the value from country B and country A regulatory_distance_matrix[k,avail_iso3s[g]] = regulatory_distance_matrix[g,avail_iso3s[k]].

for (g in 1:length(avail_iso3s)){
  for (k in 1:length(avail_iso3s)){
    if (is.na(regulatory_distance_matrix[k,avail_iso3s[g]])){
      regulatory_distance_matrix[k,avail_iso3s[g]] <- regulatory_distance_matrix[g,avail_iso3s[k]]
      }
  }
}

write.csv(regulatory_distance_matrix, "regulatory_distance_matrix_ESCAP.csv")

Quiz

Let’s do a quick practice! Make sure you have run the code on your own.

What is the regulatory distance between Kazakhstan (KAZ) and China (CHN)?

A. 0.12
B. 0.22
C. 0.32
D. 0.02

# You need to run the code and change the path for the "write.csv" command. Once you open it, you can locate that KAZ and CHN have 0.32280137848764 or 0.32 as regulatory distance, hence Answer C is the correct answer. You can find the full table here: https://r.tiid.org/regulatory_distance_matrix_ESCAP.csv

8.2 Multidimensional scaling

Using multidimensional scaling (MDS) we can display the information from the regulatory distance matrix by performing principal coordinates analysis. We can therefore show how far or close countries are according to the regulatory distance. This way we can illustrate regional trade agreements (RTAs). Explaining MDS and principal coordinates analysis goes beyond the scope of this guide. As the commands are ready-made and easy to use in R, it is not necessary to have prior knowledge of the methodology. However, readers further interested in MDS and principal coordinates analysis are encouraged to check the help page for the magrittr package for useful references and more information.

First, we download and activate the necessary packages as shown below.

#install.packages("magrittr")
library(magrittr)
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:tidyr':
## 
##     extract
#install.packages("dplyr")
library(dplyr)
#install.packages("ggpubr")
library(ggpubr)

We first need to render the regulatory distance matrix in the correct format. We copy the regulatory distance matrix and call it regulatory_distance_matrix_MDS. Then we make the row names equal to the reporter column and exclude the reporter column.

regulatory_distance_matrix_MDS <- regulatory_distance_matrix
rownames(regulatory_distance_matrix_MDS) <- regulatory_distance_matrix_MDS$reporter
regulatory_distance_matrix_MDS <- regulatory_distance_matrix_MDS %>% select (-c(reporter))

We perform principal coordinates analysis with the command cmdscale and make a “tibble”. We call the new columns Dim.1 and Dim.2.

regulatory_distance_matrix_MDS <- regulatory_distance_matrix_MDS %>% cmdscale() %>% as_tibble()
## Warning: The `x` argument of `as_tibble.matrix()` must have column names if `.name_repair` is omitted as of tibble 2.0.0.
## Using compatibility `.name_repair`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
colnames(regulatory_distance_matrix_MDS) <- c("Dim.1", "Dim.2")

head(regulatory_distance_matrix_MDS)
## # A tibble: 6 x 2
##     Dim.1    Dim.2
##     <dbl>    <dbl>
## 1  0.0168  0.0158 
## 2 -0.0163 -0.00568
## 3  0.0149  0.0111 
## 4 -0.271  -0.0340 
## 5  0.0159  0.0107 
## 6  0.0123  0.0171

Dim.1 and Dim.2 represent the countries’ positions on a two-dimensional map such that the distances between the countries are approximately equal to the dissimilarities. The actual values do not mean anything, what is important are the distances between countries.

We can plot this and see the relative distance between countries.

We use ggscatter to plot a scatter plot. We add to our scatter plot a horizontal geom_hline and vertical lines geom_vline to show the horizontal and vertical axis, respectively.

scatterplot <- ggscatter(regulatory_distance_matrix_MDS, x = "Dim.1", y = "Dim.2",
          label = regulatory_distance_matrix$reporter,
          size = 1,
          repel = TRUE)
scatterplot + geom_hline(yintercept=0, linetype="dashed", color = "red") +
  geom_vline(xintercept = 0, linetype="dotted",
                color = "blue")

Naturally, most of the ASEAN countries are close to each other (Myanmar, Borneo, Malaysia, Thailand and Singapore). Russia, Kazakhstan and Kyrgyzstan are in the Eurasian Economic Union (EAEU). We can also see that China is quite far away, which is expected as China has one of highest number of NTMs in the world.