Chapter 7 Coverage ratio, Frequency index and Prevalence score for Viet Nam

In this section we will calculate once again the NTM indicators - coverage ratio, frequency index and prevalence score for Viet Nam.

Let’s first activate the packages we need. There is commented code in case you have not installed them already.

#install.packages("dplyr")
library("dplyr")
#install.packages("readstata13")
library("readstata13")
#install.packages("tidyr")
library("tidyr")
## Warning: package 'tidyr' was built under R version 3.5.3

Let’s download all the data we need first. Once again for the calculation of the incidence indicators for NTMs we only need the import data for Viet Nam and the list with all NTMs applied to products. In the table below you can find the two data sets we will use. The link to download is in the third column.

Description File name Link to download
WITS trade data for Viet Nam 2015-2017 _trade_data_VNM_WITS_2015-2017.csv_ here
List of all products which have an NTM ntm_hs6_2016 v.12.dta here

7.1 Uploading and cleaning the WITS trade data for Viet Nam in R

Let’s upload the data in R. (We downloaded this data from the WITS website.)

trade_data_VNM <- read.csv("data/trade_data_VNM_WITS_2015-2017.csv",
                     stringsAsFactors = FALSE)
head(trade_data_VNM)
##   reporter partner   hs6 year flow   value
## 1      VNM     AUS 10129 2016    5 211.878
## 2      VNM     AUS 10129 2017    5 681.523
## 3      VNM     MYS 10129 2017    5  49.167
## 4      VNM     WLD 10129 2015    5 249.600
## 5      VNM     WLD 10129 2016    5 899.994
## 6      VNM     WLD 10129 2017    5 730.690

We only include products NOT higher than 980000 as HS chapters 98 and 99 are not harmonized, meaning that they are reserved for national use only and therefore are not comparable between countries. To do this we make the product code variable numeric.

trade_data_VNM$hs6 <- as.numeric(trade_data_VNM$hs6)
trade_data_VNM <- trade_data_VNM[!(trade_data_VNM$hs6 > 980000),]

We change the reporter name for Romania from “ROM” to “ROU” in order to be coherent with the NTMs data. We will later in Section 7.4 merge the two data sets and calculate the coverage ratio, frequency index and prevalence score.

trade_data_VNM$reporter[trade_data_VNM$reporter=="ROM"] <- "ROU"

7.2 Uploading and cleaning the Non-Tariff Measures (NTMs) data for Viet Nam

We load the NTM data and select the reporter to be Viet Nam. (If interested, you can find the original NTMs data here.) We select the columns with countries imposing an NTM, the partner affected, the NTM code at 4 digits and 1 digit, and the product (HS) code. We rename the NTM code from ntmcode to NTM and change StartDate to year for easier use.

Keep in mind, this is a big file, so it might take a few minutes to import.

ntms_VNM <- read.dta13("data/ntm_hs6_2016 v.12.dta") 
## Warning in read.dta13("data/ntm_hs6_2016 v.12.dta"): 
##   notTL:
##   Missing factor labels - no labels assigned.
##   Set option generate.factors=T to generate labels.
ntms_VNM <- ntms_VNM[ntms_VNM$reporter=="VNM", ] ## select Viet Nam as reporter
ntms_VNM <-  ntms_VNM %>% select("partner", "ntmcode",  "hs6", "StartDate", "ntm_1_digit") %>% # select only certain variables 
  rename(NTM = ntmcode, year = StartDate) #and rename names 
head(ntms_VNM)
##      partner  NTM    hs6 year ntm_1_digit
## 2525     WLD A140 010121 2004           A
## 2754     WLD A150 010121 2004           A
## 3219     WLD A310 010121 2004           A
## 3514     WLD A810 010121 2004           A
## 6095     WLD A890 010121 2004           A
## 7193     WLD P130 010121 2004           P

We drop product codes beyond 980000, just as we did for the trade data. The reason is that HS chapters 98 and 99 are not harmonized; they are reserved for national use only and therefore are not comparable between countries.

ntms_VNM$hs6 <- as.numeric(ntms_VNM$hs6)
ntms_VNM <- ntms_VNM[!(ntms_VNM$hs6 > 980000),]

7.2.1 Make NTM data binary by separating WLD and EUN into individual countries

Next, we create the variable EUN which contains all the ISO codes of countries from the European Union and we create WLD - for all the countries in the world partaking in trade. We use these variables to create binary country relationships in the NTM data set.

We first start with the European Union member States and create manually the variable. We write up all the European countries and with paste join them together into one variable, separated by a comma.

EUN <- c('AUT', 'BEL', 'BGR', 'HRV', 'CYP', 'CZE', 'DNK', 'EST', 'FIN', 'FRA', 'DEU', 'GRC', 'HUN', 'IRL', 'ITA', 'LVA', 'LTU', 'LUX', 'MLT', 'NLD', 'POL', 'PRT', 'ROM', 'SVK', 'SVN', 'ESP', 'SWE', 'GBR')
EUN <- paste(EUN,collapse=",")

We find the unique partners from the trade data and call them All_members and exclude ISO3 codes “WLD” and “EUN”, as we want to have only individual member States.

All_members <- unique(trade_data_VNM$partner)
All_members <- All_members[!(All_members %in% c("WLD","EUN"))]
All_members <- paste0(All_members, collapse=",")

All_members now contains all the available trade world members and we can replace “WLD” with it.

We replace the ISO3 codes “WLD” and “EUN” with the lists of individual member States. We replace “WLD” with All_members and “EUN” with all the countries in the European Union EUN. We do this with gsub which replaces all matches. We separate the different lists of countries into rows using separate_rows.

ntms_VNM$partner  <- gsub("WLD", All_members, ntms_VNM$partner)
ntms_VNM$partner  <- gsub("EUN", EUN, ntms_VNM$partner)

ntms_VNM <- ntms_VNM %>% separate_rows(partner, sep = ",")
nrow(ntms_VNM)
## [1] 2726605
head(ntms_VNM)
##   partner  NTM   hs6 year ntm_1_digit
## 1     AUS A140 10121 2004           A
## 2     MYS A140 10121 2004           A
## 3     ZAF A140 10121 2004           A
## 4     LAO A140 10121 2004           A
## 5     NZL A140 10121 2004           A
## 6     THA A140 10121 2004           A

Since All_members contains “VNM”, we remove when partner equals “VNM”.

ntms_VNM <- ntms_VNM[ntms_VNM$partner !="VNM",]

We now take imports only and hence exclude “P” which stands for export related NTMs.

ntms_VNM <- ntms_VNM[ntms_VNM$ntm_1_digit !="P",]
head(ntms_VNM)
##   partner  NTM   hs6 year ntm_1_digit
## 1     AUS A140 10121 2004           A
## 2     MYS A140 10121 2004           A
## 3     ZAF A140 10121 2004           A
## 4     LAO A140 10121 2004           A
## 5     NZL A140 10121 2004           A
## 6     THA A140 10121 2004           A

Next, we remove the duplicates and keep only the unique combinations of NTMs, product code and partner. We do the usual - first group the variables NTMs, product code and partner and count how many duplicates there are. As we are not interested in the number of duplicates per combination, we only select the NTMs, product code and partner.

This method below is much quicker than using unique. We will afterwards use this data to calculate the coverage ratio, frequency index and prevalence score.

ntms_VNM <- ntms_VNM %>% group_by(NTM, hs6, partner) %>% count() %>% 
  select(NTM, hs6, partner) 
head(ntms_VNM)
## # A tibble: 6 x 3
## # Groups:   NTM, hs6, partner [6]
##   NTM     hs6 partner
##   <chr> <dbl> <chr>  
## 1 A140  10121 AGO    
## 2 A140  10121 AND    
## 3 A140  10121 ARE    
## 4 A140  10121 ARG    
## 5 A140  10121 AUS    
## 6 A140  10121 AUT

7.3 Create import data

We subset the trade data and select the trade flow to be 5 - we only take the imports. We select only the unique partners from the NTMs data. We check the number of rows.

VNM_import <- subset(trade_data_VNM, 
                     trade_data_VNM$flow==5&
                     trade_data_VNM$partner %in% unique(ntms_VNM$partner),
                     select = c(partner, hs6, year, value))
nrow(VNM_import)
## [1] 208941
head(VNM_import)
##   partner   hs6 year    value
## 1     AUS 10129 2016  211.878
## 2     AUS 10129 2017  681.523
## 3     MYS 10129 2017   49.167
## 7     ZAF 10129 2015  249.600
## 8     ZAF 10129 2016  688.116
## 9     AUS 10221 2015 8301.276

We use the above data set to calculate the average value for combinations of the partner, the product code and the year. The function summarize(value = mean(value)) in this case takes the average value of any duplicates of the combinations and includes this number under “value”.

VNM_import <- VNM_import %>% group_by(partner, hs6, year) %>% summarize(value = mean(value))
nrow(VNM_import)
## [1] 208941
head(VNM_import)
## # A tibble: 6 x 4
## # Groups:   partner, hs6 [5]
##   partner    hs6  year   value
##   <chr>    <dbl> <int>   <dbl>
## 1 AGO      30749  2017  154.  
## 2 AGO      50790  2017    7.84
## 3 AGO     120799  2017   12.6 
## 4 AGO     150420  2016  207.  
## 5 AGO     230120  2015  601   
## 6 AGO     230120  2016 2308.

The number of rows of the data does not change so we can assume there were not any duplicates.

Before we go and allocate for which year we have missing values, we create a variable where we can keep the years in, as later when we gather the data together, we will use it.

VNM_import_years <- unique(VNM_import$year)
head(VNM_import_years)
## [1] 2017 2016 2015

We next want to include zeros instead of the missing years in order to calculate the average value over years for a product code and partner country. We do this to adjust for year-specific fluctuations. We first spread the values across the three available years using spread. Whenever we have a missing value, we include a 0 instead.

VNM_import <- spread(VNM_import, year, value)
VNM_import[is.na(VNM_import)] <- 0
head(VNM_import)
## # A tibble: 6 x 5
## # Groups:   partner, hs6 [6]
##   partner    hs6 `2015` `2016`   `2017`
##   <chr>    <dbl>  <dbl>  <dbl>    <dbl>
## 1 AGO      30749      0     0    154.  
## 2 AGO      50790      0     0      7.84
## 3 AGO     120799      0     0     12.6 
## 4 AGO     150420      0   207.     0   
## 5 AGO     230120    601  2308. 13380.  
## 6 AGO     230400      0     0    544.

We now have all the years present for all HS codes. We make year and value separate columns with gather. The function gather collects a set of column names and places them into a single “key” column.

VNM_import <- gather(VNM_import, year, value, as.character(VNM_import_years), factor_key=FALSE)
head(VNM_import)
## # A tibble: 6 x 4
## # Groups:   partner, hs6 [6]
##   partner    hs6 year     value
##   <chr>    <dbl> <chr>    <dbl>
## 1 AGO      30749 2017    154.  
## 2 AGO      50790 2017      7.84
## 3 AGO     120799 2017     12.6 
## 4 AGO     150420 2017      0   
## 5 AGO     230120 2017  13380.  
## 6 AGO     230400 2017    544.

We then take an average over the years.

VNM_import <- VNM_import %>% group_by(partner, hs6) %>% summarize(value=mean(value))
head(VNM_import)
## # A tibble: 6 x 3
## # Groups:   partner [1]
##   partner    hs6   value
##   <chr>    <dbl>   <dbl>
## 1 AGO      30749   51.4 
## 2 AGO      50790    2.61
## 3 AGO     120799    4.18
## 4 AGO     150420   69.1 
## 5 AGO     230120 5429.  
## 6 AGO     230400  181.

Now we have the import data ready, so we can go onto calculating the coverage ratio, frequency index and prevalence score.

7.4 Calculating the Coverage ratio, Frequency index and Prevalence score

Let’s now calculate the coverage ratio, frequency index and prevalence score for Viet Nam.

We create a summary data matrix where we will store the latest year for which Viet Nam’s imports are recorded in WITS and the coverage ratios, frequency indices and prevalence scores as a whole and by sector. The sectors we use are agriculture, natural resources and manufacturing. “agro” stands for agriculture, “NR” for natural resources and “manu” for manufacturing, e.g. CR_agro is the coverage ratio for agriculture for Viet Nam.

We include in the data set the latest year as before defined.

summary_data_VNM <- data.frame(matrix(vector(),1,13,
                                  dimnames = list(c(), 
                                                  c("latest_year", 
                                                    "CR",
                                                    "FI",
                                                    "PS", 
                                                    "CR_agro",
                                                    "CR_NR",
                                                    "CR_manu",
                                                    "FI_agro",
                                                    "FI_NR",
                                                    "FI_manu",
                                                    "PS_agro",
                                                    "PS_NR",
                                                    "PS_manu"
                                                    )
                                  )),
                           stringsAsFactors=F)

summary_data_VNM[1,"latest_year"] <- max(VNM_import_years) 

7.4.1 Coverage ratio

Let’s now calculate the coverage ratio. Remember this is the formula:

\(CR_i = \frac{\sum^{hs}_{k=1} NTM_{ik} X_{ik}}{\sum^{hs}_{k=1} X_{ik}} 100\).

where \(NTM_{ik}\) indicates if there is an NTM or not for a traded good and \(X_{ik}\) is the trade value.

We start by just selecting the product (hs6) code and the partner from the NTM data. Then we group all combinations where the product code and partner are the same and we keep only the product code and partner name. (We exclude the count “n” as we do not need it in this case.) We add a column called NTM with values for all rows as 1 - this signifies that there is an NTM for each combination of HS 6-digit code and partner.

VNM_ntm_import_CR <- data.frame(ntms_VNM) %>% select(hs6, partner)

VNM_ntm_import_CR <- VNM_ntm_import_CR %>% group_by(hs6, partner) %>% summarise(count = n()) %>% select(hs6, partner)

VNM_ntm_import_CR$NTM <- 1

head(VNM_ntm_import_CR)
## # A tibble: 6 x 3
## # Groups:   hs6 [1]
##     hs6 partner   NTM
##   <dbl> <chr>   <dbl>
## 1 10121 AGO         1
## 2 10121 AND         1
## 3 10121 ARE         1
## 4 10121 ARG         1
## 5 10121 AUS         1
## 6 10121 AUT         1

Next we combine the data set we just created above with VNM_import in order to include the trade values. If an NTM is missing, we put the value 0.

CR_ntm_data <- merge(VNM_import,VNM_ntm_import_CR, by=c("partner", "hs6"), all.x = TRUE )

CR_ntm_data$NTM[is.na(CR_ntm_data$NTM)] <- 0

head(CR_ntm_data)
##   partner    hs6       value NTM
## 1     AGO  30749   51.361000   1
## 2     AGO  50790    2.613333   0
## 3     AGO 120799    4.183333   1
## 4     AGO 150420   69.090000   1
## 5     AGO 230120 5429.471667   1
## 6     AGO 230400  181.450000   1

Notice, this time for the overall coverage ratio, we do include all.x = TRUE, because some HS codes do not have an NTM.

Finally, we sum all the values where an NTM is present and we divide by the total sum of all the values. We multiply by 100 to have a percentage. We include this number in our summary data set.

CR <- sum(CR_ntm_data$NTM*CR_ntm_data$value)/sum(CR_ntm_data$value)*100
CR
## [1] 62.06449
summary_data_VNM[1, "CR"] <- CR

7.4.2 Frequency Index

Remember from Section 6, the frequency index is similar to the coverage ratio, except that instead of multiplying by the trade value, we multiply by a dummy variable set to show whether there was any trade value or not (\(D_{ik}\)):

\(FI_i = \frac{\sum^{hs}_{k=1} NTM_{ik} D_{ik}}{\sum^{hs}_{k=1} D_{ik}} 100\).

where \(NTM_{ik}\) is a binary variable, which shows if there is an NTM or not for a traded good \(k\) and \(D_{ik}\) is also a binary variable showing whether there is any trade value associated with a product \(k\) or not.

Hence, we can use the above created data but add a new dummy variable representing whether there was trade or not per HS code.

Let’s copy this data set and create a new variable called “valueDummy” to represent \(D_{ik}\). We equate it to 1 if there is a value, otherwise it is 0.

FI_ntm_data <- CR_ntm_data
FI_ntm_data$valueDummy <- 0
FI_ntm_data[!is.na(FI_ntm_data$value),"valueDummy"] <- 1
head(FI_ntm_data)
##   partner    hs6       value NTM valueDummy
## 1     AGO  30749   51.361000   1          1
## 2     AGO  50790    2.613333   0          1
## 3     AGO 120799    4.183333   1          1
## 4     AGO 150420   69.090000   1          1
## 5     AGO 230120 5429.471667   1          1
## 6     AGO 230400  181.450000   1          1

The formula is the same as the one for the coverage ratio, except now we substitute the trade value with the trade dummy.

FI <- sum(FI_ntm_data$NTM*FI_ntm_data$valueDummy)/sum(FI_ntm_data$valueDummy)*100
FI
## [1] 50.81422
summary_data_VNM[1, "FI"] <- FI

This means that 51 % of imported products are covered by at least one NTM.

7.4.3 Prevalence score

Let’s now calculate the prevalence score. Remember this is the formula:

\(PS_i = \frac{\sum^{hs}_{k=1} \#NTM_{ik} D_{ik}}{\sum^{hs}_{k=1} D_{ik}}\).

where \(\#NTM_{ik}\) is the number of NTMs per product \(k\) and \(D_{ik}\) is a binary variable showing whether there is any trade value associated with a product \(k\) or not.

First we start by counting the HS 6-digit codes per partner.

VNM_ntm_PS <- ntms_VNM %>% group_by(hs6, partner) %>% summarise(count = n())
head(VNM_ntm_PS)
## # A tibble: 6 x 3
## # Groups:   hs6 [1]
##     hs6 partner count
##   <dbl> <chr>   <int>
## 1 10121 AGO         5
## 2 10121 AND         5
## 3 10121 ARE         5
## 4 10121 ARG         5
## 5 10121 AUS         5
## 6 10121 AUT         5

Next, we merge this data set with VNM_import according to the HS6 code and partner, and include the mean trade value. Afterwards, for any missing value in “count”, we put a 0.

PS_ntm_data <- merge(VNM_ntm_PS, VNM_import, by=c("hs6", "partner"), all.y = TRUE)

head(PS_ntm_data)
##     hs6 partner count      value
## 1 10129     AUS    NA  297.80033
## 2 10129     MYS    NA   16.38900
## 3 10129     ZAF    NA  312.57200
## 4 10221     AUS     5 5002.01867
## 5 10221     NZL     5   17.00333
## 6 10221     THA     5  393.93000
PS_ntm_data$count[is.na(PS_ntm_data$count)] <- 0

We substitute “value” with 1 if the value is greater than zero. If missing or 0, we put a 0.

PS_ntm_data$value[!(is.na(PS_ntm_data$value))&(PS_ntm_data$value>0)] <- 1
PS_ntm_data$value[is.na(PS_ntm_data$value)|PS_ntm_data$value==0] <- 0
head(PS_ntm_data)
##     hs6 partner count value
## 1 10129     AUS     0     1
## 2 10129     MYS     0     1
## 3 10129     ZAF     0     1
## 4 10221     AUS     5     1
## 5 10221     NZL     5     1
## 6 10221     THA     5     1

Next, we sum the multiplication of the values and the count, and divide by the sum of all the values. We store this in our summary_data.

PS <- sum(PS_ntm_data$value*PS_ntm_data$count)/sum(PS_ntm_data$value)
PS
## [1] 1.86705
summary_data_VNM[1, "PS"] <- PS

7.4.4 Coverage ratios, Frequency index and Prevalence scores by broad sectors

Broad product groups are defined by the Harmonized System (HS) at 2-digit: Agriculture corresponds to HS 1-24, Natural Resources to HS 25-27, and Manufacturing to 28-97. Study and run the code below and practice at the end with the quiz.

summary_data_VNM[1, "CR_agro"] <- sum(CR_ntm_data$NTM[CR_ntm_data$hs6<250000]*CR_ntm_data$value[CR_ntm_data$hs6<250000])/sum(CR_ntm_data$value[CR_ntm_data$hs6<250000])*100
summary_data_VNM[1, "CR_NR"] <- sum(CR_ntm_data$NTM[CR_ntm_data$hs6>=250000&CR_ntm_data$hs6<280000]*CR_ntm_data$value[CR_ntm_data$hs6>=250000&CR_ntm_data$hs6<280000])/sum(CR_ntm_data$value[CR_ntm_data$hs6>=250000&CR_ntm_data$hs6<280000])*100
summary_data_VNM[1, "CR_manu"] <- sum(CR_ntm_data$NTM[CR_ntm_data$hs6>=280000]*CR_ntm_data$value[CR_ntm_data$hs6>=280000])/sum(CR_ntm_data$value[CR_ntm_data$hs6>=280000])*100

summary_data_VNM[1, "FI_agro"] <- sum(FI_ntm_data$NTM[FI_ntm_data$hs6<250000]*FI_ntm_data$valueDummy[FI_ntm_data$hs6<250000])/sum(FI_ntm_data$valueDummy[FI_ntm_data$hs6<250000])*100
summary_data_VNM[1, "FI_NR"] <- sum(FI_ntm_data$NTM[FI_ntm_data$hs6>=250000&FI_ntm_data$hs6<280000]*FI_ntm_data$valueDummy[FI_ntm_data$hs6>=250000&FI_ntm_data$hs6<280000])/sum(FI_ntm_data$valueDummy[FI_ntm_data$hs6>=250000&FI_ntm_data$hs6<280000])*100
summary_data_VNM[1, "FI_manu"] <- sum(FI_ntm_data$NTM[FI_ntm_data$hs6>=280000]*FI_ntm_data$valueDummy[FI_ntm_data$hs6>=280000])/sum(FI_ntm_data$valueDummy[FI_ntm_data$hs6>=280000])*100

rm(CR, FI) # removes CR and FI data sets
write.csv(summary_data_VNM, "summary_data_VNM.csv", row.names = FALSE)

Quiz

Time for some practice! Now having seen how we calculate the coverage ratios and frequency indices, calculate the prevalence score for the same sectors (e.g. Agriculture corresponds to HS 1-24, Natural Resources to HS 25-27, and Manufacturing to 28-97).

What is the correct answer?

A. PS_agro = 14.36, PS_NR = 4.14, PS_manu = 0.51
B. PS_agro = 16.36, PS_NR = 4.40, PS_manu = 3.99
C. PS_agro = 5.36, PS_NR = 4.14, PS_manu = 5.51
D. PS_agro = 11.36, PS_NR = 1.40, PS_manu = 0.99

# We do the same as in the above section but using the prevalence score formula
summary_data_VNM[1, "PS_agro"] <- sum(PS_ntm_data$value[PS_ntm_data$hs6<250000]*PS_ntm_data$count[PS_ntm_data$hs6<250000])/sum(PS_ntm_data$value[PS_ntm_data$hs6<250000])
summary_data_VNM[1, "PS_NR"] <- sum(PS_ntm_data$value[PS_ntm_data$hs6>=250000&PS_ntm_data$hs6<280000]*PS_ntm_data$count[PS_ntm_data$hs6>=250000&PS_ntm_data$hs6<280000])/sum(PS_ntm_data$value[PS_ntm_data$hs6>=250000&PS_ntm_data$hs6<280000])
summary_data_VNM[1, "PS_manu"] <- sum(PS_ntm_data$value[PS_ntm_data$hs6>=280000]*PS_ntm_data$count[PS_ntm_data$hs6>=280000])/sum(PS_ntm_data$value[PS_ntm_data$hs6>=280000])

#Therefore Answer D is the correct anser.