Chapter 7 Coverage ratio, Frequency index and Prevalence score for Viet Nam
In this section we will calculate once again the NTM indicators - coverage ratio, frequency index and prevalence score for Viet Nam.
Let’s first activate the packages we need. There is commented code in case you have not installed them already.
#install.packages("dplyr")
library("dplyr")
#install.packages("readstata13")
library("readstata13")
#install.packages("tidyr")
library("tidyr")
## Warning: package 'tidyr' was built under R version 3.5.3
Let’s download all the data we need first. Once again for the calculation of the incidence indicators for NTMs we only need the import data for Viet Nam and the list with all NTMs applied to products. In the table below you can find the two data sets we will use. The link to download is in the third column.
Description | File name | Link to download |
---|---|---|
WITS trade data for Viet Nam 2015-2017 | _trade_data_VNM_WITS_2015-2017.csv_ | here |
List of all products which have an NTM | ntm_hs6_2016 v.12.dta | here |
7.1 Uploading and cleaning the WITS trade data for Viet Nam in R
Let’s upload the data in R. (We downloaded this data from the WITS website.)
trade_data_VNM <- read.csv("data/trade_data_VNM_WITS_2015-2017.csv",
stringsAsFactors = FALSE)
head(trade_data_VNM)
## reporter partner hs6 year flow value
## 1 VNM AUS 10129 2016 5 211.878
## 2 VNM AUS 10129 2017 5 681.523
## 3 VNM MYS 10129 2017 5 49.167
## 4 VNM WLD 10129 2015 5 249.600
## 5 VNM WLD 10129 2016 5 899.994
## 6 VNM WLD 10129 2017 5 730.690
We only include products NOT higher than 980000 as HS chapters 98 and 99 are not harmonized, meaning that they are reserved for national use only and therefore are not comparable between countries. To do this we make the product code variable numeric.
trade_data_VNM$hs6 <- as.numeric(trade_data_VNM$hs6)
trade_data_VNM <- trade_data_VNM[!(trade_data_VNM$hs6 > 980000),]
We change the reporter name for Romania from “ROM” to “ROU” in order to be coherent with the NTMs data. We will later in Section 7.4 merge the two data sets and calculate the coverage ratio, frequency index and prevalence score.
trade_data_VNM$reporter[trade_data_VNM$reporter=="ROM"] <- "ROU"
7.2 Uploading and cleaning the Non-Tariff Measures (NTMs) data for Viet Nam
We load the NTM data and select the reporter to be Viet Nam. (If interested, you can find the original NTMs data here.) We select
the columns with countries imposing an NTM, the partner affected, the NTM code at 4 digits and 1 digit, and the product (HS) code. We rename
the NTM code from ntmcode
to NTM
and change StartDate
to year
for easier use.
Keep in mind, this is a big file, so it might take a few minutes to import.
ntms_VNM <- read.dta13("data/ntm_hs6_2016 v.12.dta")
## Warning in read.dta13("data/ntm_hs6_2016 v.12.dta"):
## notTL:
## Missing factor labels - no labels assigned.
## Set option generate.factors=T to generate labels.
ntms_VNM <- ntms_VNM[ntms_VNM$reporter=="VNM", ] ## select Viet Nam as reporter
ntms_VNM <- ntms_VNM %>% select("partner", "ntmcode", "hs6", "StartDate", "ntm_1_digit") %>% # select only certain variables
rename(NTM = ntmcode, year = StartDate) #and rename names
head(ntms_VNM)
## partner NTM hs6 year ntm_1_digit
## 2525 WLD A140 010121 2004 A
## 2754 WLD A150 010121 2004 A
## 3219 WLD A310 010121 2004 A
## 3514 WLD A810 010121 2004 A
## 6095 WLD A890 010121 2004 A
## 7193 WLD P130 010121 2004 P
We drop product codes beyond 980000, just as we did for the trade data. The reason is that HS chapters 98 and 99 are not harmonized; they are reserved for national use only and therefore are not comparable between countries.
ntms_VNM$hs6 <- as.numeric(ntms_VNM$hs6)
ntms_VNM <- ntms_VNM[!(ntms_VNM$hs6 > 980000),]
7.2.1 Make NTM data binary by separating WLD and EUN into individual countries
Next, we create the variable EUN
which contains all the ISO codes of countries from the European Union and we create WLD
- for all the countries in the world partaking in trade. We use these variables to create binary country relationships in the NTM data set.
We first start with the European Union member States and create manually the variable. We write up all the European countries and with paste
join them together into one variable, separated by a comma.
EUN <- c('AUT', 'BEL', 'BGR', 'HRV', 'CYP', 'CZE', 'DNK', 'EST', 'FIN', 'FRA', 'DEU', 'GRC', 'HUN', 'IRL', 'ITA', 'LVA', 'LTU', 'LUX', 'MLT', 'NLD', 'POL', 'PRT', 'ROM', 'SVK', 'SVN', 'ESP', 'SWE', 'GBR')
EUN <- paste(EUN,collapse=",")
We find the unique partners from the trade data and call them All_members
and exclude ISO3 codes “WLD” and “EUN”, as we want to have only individual member States.
All_members <- unique(trade_data_VNM$partner)
All_members <- All_members[!(All_members %in% c("WLD","EUN"))]
All_members <- paste0(All_members, collapse=",")
All_members
now contains all the available trade world members and we can replace “WLD” with it.
We replace the ISO3 codes “WLD” and “EUN” with the lists of individual member States. We replace “WLD” with All_members
and “EUN” with all the countries in the European Union EUN
. We do this with gsub
which replaces all matches. We separate the different lists of countries into rows using separate_rows
.
ntms_VNM$partner <- gsub("WLD", All_members, ntms_VNM$partner)
ntms_VNM$partner <- gsub("EUN", EUN, ntms_VNM$partner)
ntms_VNM <- ntms_VNM %>% separate_rows(partner, sep = ",")
nrow(ntms_VNM)
## [1] 2726605
head(ntms_VNM)
## partner NTM hs6 year ntm_1_digit
## 1 AUS A140 10121 2004 A
## 2 MYS A140 10121 2004 A
## 3 ZAF A140 10121 2004 A
## 4 LAO A140 10121 2004 A
## 5 NZL A140 10121 2004 A
## 6 THA A140 10121 2004 A
Since All_members
contains “VNM”, we remove when partner
equals “VNM”.
ntms_VNM <- ntms_VNM[ntms_VNM$partner !="VNM",]
We now take imports only and hence exclude “P” which stands for export related NTMs.
ntms_VNM <- ntms_VNM[ntms_VNM$ntm_1_digit !="P",]
head(ntms_VNM)
## partner NTM hs6 year ntm_1_digit
## 1 AUS A140 10121 2004 A
## 2 MYS A140 10121 2004 A
## 3 ZAF A140 10121 2004 A
## 4 LAO A140 10121 2004 A
## 5 NZL A140 10121 2004 A
## 6 THA A140 10121 2004 A
Next, we remove the duplicates and keep only the unique combinations of NTMs, product code and partner. We do the usual - first group the variables NTMs, product code and partner and count how many duplicates there are. As we are not interested in the number of duplicates per combination, we only select the NTMs, product code and partner.
This method below is much quicker than using unique
. We will afterwards use this data to calculate the coverage ratio, frequency index and prevalence score.
ntms_VNM <- ntms_VNM %>% group_by(NTM, hs6, partner) %>% count() %>%
select(NTM, hs6, partner)
head(ntms_VNM)
## # A tibble: 6 x 3
## # Groups: NTM, hs6, partner [6]
## NTM hs6 partner
## <chr> <dbl> <chr>
## 1 A140 10121 AGO
## 2 A140 10121 AND
## 3 A140 10121 ARE
## 4 A140 10121 ARG
## 5 A140 10121 AUS
## 6 A140 10121 AUT
7.3 Create import data
We subset
the trade data and select the trade flow to be 5 - we only take the imports. We select only the unique partners from the NTMs data. We check the number of rows.
VNM_import <- subset(trade_data_VNM,
trade_data_VNM$flow==5&
trade_data_VNM$partner %in% unique(ntms_VNM$partner),
select = c(partner, hs6, year, value))
nrow(VNM_import)
## [1] 208941
head(VNM_import)
## partner hs6 year value
## 1 AUS 10129 2016 211.878
## 2 AUS 10129 2017 681.523
## 3 MYS 10129 2017 49.167
## 7 ZAF 10129 2015 249.600
## 8 ZAF 10129 2016 688.116
## 9 AUS 10221 2015 8301.276
We use the above data set to calculate the average value for combinations of the partner, the product code and the year. The function summarize(value = mean(value))
in this case takes the average value
of any duplicates of the combinations and includes this number under “value”.
VNM_import <- VNM_import %>% group_by(partner, hs6, year) %>% summarize(value = mean(value))
nrow(VNM_import)
## [1] 208941
head(VNM_import)
## # A tibble: 6 x 4
## # Groups: partner, hs6 [5]
## partner hs6 year value
## <chr> <dbl> <int> <dbl>
## 1 AGO 30749 2017 154.
## 2 AGO 50790 2017 7.84
## 3 AGO 120799 2017 12.6
## 4 AGO 150420 2016 207.
## 5 AGO 230120 2015 601
## 6 AGO 230120 2016 2308.
The number of rows of the data does not change so we can assume there were not any duplicates.
Before we go and allocate for which year we have missing values, we create a variable where we can keep the years in, as later when we gather the data together, we will use it.
VNM_import_years <- unique(VNM_import$year)
head(VNM_import_years)
## [1] 2017 2016 2015
We next want to include zeros instead of the missing years in order to calculate the average value over years for a product code and partner country. We do this to adjust for year-specific fluctuations. We first spread the values across the three available years using spread
. Whenever we have a missing value, we include a 0 instead.
VNM_import <- spread(VNM_import, year, value)
VNM_import[is.na(VNM_import)] <- 0
head(VNM_import)
## # A tibble: 6 x 5
## # Groups: partner, hs6 [6]
## partner hs6 `2015` `2016` `2017`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AGO 30749 0 0 154.
## 2 AGO 50790 0 0 7.84
## 3 AGO 120799 0 0 12.6
## 4 AGO 150420 0 207. 0
## 5 AGO 230120 601 2308. 13380.
## 6 AGO 230400 0 0 544.
We now have all the years present for all HS codes. We make year and value separate columns with gather
. The function gather
collects a set of column names and places them into a single “key” column.
VNM_import <- gather(VNM_import, year, value, as.character(VNM_import_years), factor_key=FALSE)
head(VNM_import)
## # A tibble: 6 x 4
## # Groups: partner, hs6 [6]
## partner hs6 year value
## <chr> <dbl> <chr> <dbl>
## 1 AGO 30749 2017 154.
## 2 AGO 50790 2017 7.84
## 3 AGO 120799 2017 12.6
## 4 AGO 150420 2017 0
## 5 AGO 230120 2017 13380.
## 6 AGO 230400 2017 544.
We then take an average over the years.
VNM_import <- VNM_import %>% group_by(partner, hs6) %>% summarize(value=mean(value))
head(VNM_import)
## # A tibble: 6 x 3
## # Groups: partner [1]
## partner hs6 value
## <chr> <dbl> <dbl>
## 1 AGO 30749 51.4
## 2 AGO 50790 2.61
## 3 AGO 120799 4.18
## 4 AGO 150420 69.1
## 5 AGO 230120 5429.
## 6 AGO 230400 181.
Now we have the import data ready, so we can go onto calculating the coverage ratio, frequency index and prevalence score.
7.4 Calculating the Coverage ratio, Frequency index and Prevalence score
Let’s now calculate the coverage ratio, frequency index and prevalence score for Viet Nam.
We create a summary data matrix where we will store the latest year for which Viet Nam’s imports are recorded in WITS and the coverage ratios, frequency indices and prevalence scores as a whole and by sector. The sectors we use are agriculture, natural resources and manufacturing. “agro” stands for agriculture, “NR” for natural resources and “manu” for manufacturing, e.g. CR_agro
is the coverage ratio for agriculture for Viet Nam.
We include in the data set the latest year as before defined.
summary_data_VNM <- data.frame(matrix(vector(),1,13,
dimnames = list(c(),
c("latest_year",
"CR",
"FI",
"PS",
"CR_agro",
"CR_NR",
"CR_manu",
"FI_agro",
"FI_NR",
"FI_manu",
"PS_agro",
"PS_NR",
"PS_manu"
)
)),
stringsAsFactors=F)
summary_data_VNM[1,"latest_year"] <- max(VNM_import_years)
7.4.1 Coverage ratio
Let’s now calculate the coverage ratio. Remember this is the formula:
where \(NTM_{ik}\) indicates if there is an NTM or not for a traded good and \(X_{ik}\) is the trade value.
We start by just selecting the product (hs6
) code and the partner from the NTM data. Then we group all combinations where the product code and partner are the same and we keep only the product code and partner name. (We exclude the count “n” as we do not need it in this case.) We add a column called NTM with values for all rows as 1 - this signifies that there is an NTM for each combination of HS 6-digit code and partner.
VNM_ntm_import_CR <- data.frame(ntms_VNM) %>% select(hs6, partner)
VNM_ntm_import_CR <- VNM_ntm_import_CR %>% group_by(hs6, partner) %>% summarise(count = n()) %>% select(hs6, partner)
VNM_ntm_import_CR$NTM <- 1
head(VNM_ntm_import_CR)
## # A tibble: 6 x 3
## # Groups: hs6 [1]
## hs6 partner NTM
## <dbl> <chr> <dbl>
## 1 10121 AGO 1
## 2 10121 AND 1
## 3 10121 ARE 1
## 4 10121 ARG 1
## 5 10121 AUS 1
## 6 10121 AUT 1
Next we combine the data set we just created above with VNM_import
in order to include the trade values. If an NTM is missing, we put the value 0.
CR_ntm_data <- merge(VNM_import,VNM_ntm_import_CR, by=c("partner", "hs6"), all.x = TRUE )
CR_ntm_data$NTM[is.na(CR_ntm_data$NTM)] <- 0
head(CR_ntm_data)
## partner hs6 value NTM
## 1 AGO 30749 51.361000 1
## 2 AGO 50790 2.613333 0
## 3 AGO 120799 4.183333 1
## 4 AGO 150420 69.090000 1
## 5 AGO 230120 5429.471667 1
## 6 AGO 230400 181.450000 1
Notice, this time for the overall coverage ratio, we do include all.x = TRUE
, because some HS codes do not have an NTM.
Finally, we sum all the values where an NTM is present and we divide by the total sum of all the values. We multiply by 100 to have a percentage. We include this number in our summary data set.
CR <- sum(CR_ntm_data$NTM*CR_ntm_data$value)/sum(CR_ntm_data$value)*100
CR
## [1] 62.06449
summary_data_VNM[1, "CR"] <- CR
7.4.2 Frequency Index
Remember from Section 6, the frequency index is similar to the coverage ratio, except that instead of multiplying by the trade value, we multiply by a dummy variable set to show whether there was any trade value or not (\(D_{ik}\)):
where \(NTM_{ik}\) is a binary variable, which shows if there is an NTM or not for a traded good \(k\) and \(D_{ik}\) is also a binary variable showing whether there is any trade value associated with a product \(k\) or not.
Hence, we can use the above created data but add a new dummy variable representing whether there was trade or not per HS code.
Let’s copy this data set and create a new variable called “valueDummy” to represent \(D_{ik}\). We equate it to 1 if there is a value, otherwise it is 0.
FI_ntm_data <- CR_ntm_data
FI_ntm_data$valueDummy <- 0
FI_ntm_data[!is.na(FI_ntm_data$value),"valueDummy"] <- 1
head(FI_ntm_data)
## partner hs6 value NTM valueDummy
## 1 AGO 30749 51.361000 1 1
## 2 AGO 50790 2.613333 0 1
## 3 AGO 120799 4.183333 1 1
## 4 AGO 150420 69.090000 1 1
## 5 AGO 230120 5429.471667 1 1
## 6 AGO 230400 181.450000 1 1
The formula is the same as the one for the coverage ratio, except now we substitute the trade value with the trade dummy.
FI <- sum(FI_ntm_data$NTM*FI_ntm_data$valueDummy)/sum(FI_ntm_data$valueDummy)*100
FI
## [1] 50.81422
summary_data_VNM[1, "FI"] <- FI
This means that 51 % of imported products are covered by at least one NTM.
7.4.3 Prevalence score
Let’s now calculate the prevalence score. Remember this is the formula:
where \(\#NTM_{ik}\) is the number of NTMs per product \(k\) and \(D_{ik}\) is a binary variable showing whether there is any trade value associated with a product \(k\) or not.
First we start by counting the HS 6-digit codes per partner.
VNM_ntm_PS <- ntms_VNM %>% group_by(hs6, partner) %>% summarise(count = n())
head(VNM_ntm_PS)
## # A tibble: 6 x 3
## # Groups: hs6 [1]
## hs6 partner count
## <dbl> <chr> <int>
## 1 10121 AGO 5
## 2 10121 AND 5
## 3 10121 ARE 5
## 4 10121 ARG 5
## 5 10121 AUS 5
## 6 10121 AUT 5
Next, we merge this data set with VNM_import
according to the HS6 code and partner, and include the mean trade value. Afterwards, for any missing value in “count”, we put a 0.
PS_ntm_data <- merge(VNM_ntm_PS, VNM_import, by=c("hs6", "partner"), all.y = TRUE)
head(PS_ntm_data)
## hs6 partner count value
## 1 10129 AUS NA 297.80033
## 2 10129 MYS NA 16.38900
## 3 10129 ZAF NA 312.57200
## 4 10221 AUS 5 5002.01867
## 5 10221 NZL 5 17.00333
## 6 10221 THA 5 393.93000
PS_ntm_data$count[is.na(PS_ntm_data$count)] <- 0
We substitute “value” with 1 if the value is greater than zero. If missing or 0, we put a 0.
PS_ntm_data$value[!(is.na(PS_ntm_data$value))&(PS_ntm_data$value>0)] <- 1
PS_ntm_data$value[is.na(PS_ntm_data$value)|PS_ntm_data$value==0] <- 0
head(PS_ntm_data)
## hs6 partner count value
## 1 10129 AUS 0 1
## 2 10129 MYS 0 1
## 3 10129 ZAF 0 1
## 4 10221 AUS 5 1
## 5 10221 NZL 5 1
## 6 10221 THA 5 1
Next, we sum the multiplication of the values and the count, and divide by the sum of all the values. We store this in our summary_data.
PS <- sum(PS_ntm_data$value*PS_ntm_data$count)/sum(PS_ntm_data$value)
PS
## [1] 1.86705
summary_data_VNM[1, "PS"] <- PS
7.4.4 Coverage ratios, Frequency index and Prevalence scores by broad sectors
Broad product groups are defined by the Harmonized System (HS) at 2-digit: Agriculture corresponds to HS 1-24, Natural Resources to HS 25-27, and Manufacturing to 28-97. Study and run the code below and practice at the end with the quiz.
summary_data_VNM[1, "CR_agro"] <- sum(CR_ntm_data$NTM[CR_ntm_data$hs6<250000]*CR_ntm_data$value[CR_ntm_data$hs6<250000])/sum(CR_ntm_data$value[CR_ntm_data$hs6<250000])*100
summary_data_VNM[1, "CR_NR"] <- sum(CR_ntm_data$NTM[CR_ntm_data$hs6>=250000&CR_ntm_data$hs6<280000]*CR_ntm_data$value[CR_ntm_data$hs6>=250000&CR_ntm_data$hs6<280000])/sum(CR_ntm_data$value[CR_ntm_data$hs6>=250000&CR_ntm_data$hs6<280000])*100
summary_data_VNM[1, "CR_manu"] <- sum(CR_ntm_data$NTM[CR_ntm_data$hs6>=280000]*CR_ntm_data$value[CR_ntm_data$hs6>=280000])/sum(CR_ntm_data$value[CR_ntm_data$hs6>=280000])*100
summary_data_VNM[1, "FI_agro"] <- sum(FI_ntm_data$NTM[FI_ntm_data$hs6<250000]*FI_ntm_data$valueDummy[FI_ntm_data$hs6<250000])/sum(FI_ntm_data$valueDummy[FI_ntm_data$hs6<250000])*100
summary_data_VNM[1, "FI_NR"] <- sum(FI_ntm_data$NTM[FI_ntm_data$hs6>=250000&FI_ntm_data$hs6<280000]*FI_ntm_data$valueDummy[FI_ntm_data$hs6>=250000&FI_ntm_data$hs6<280000])/sum(FI_ntm_data$valueDummy[FI_ntm_data$hs6>=250000&FI_ntm_data$hs6<280000])*100
summary_data_VNM[1, "FI_manu"] <- sum(FI_ntm_data$NTM[FI_ntm_data$hs6>=280000]*FI_ntm_data$valueDummy[FI_ntm_data$hs6>=280000])/sum(FI_ntm_data$valueDummy[FI_ntm_data$hs6>=280000])*100
rm(CR, FI) # removes CR and FI data sets
write.csv(summary_data_VNM, "summary_data_VNM.csv", row.names = FALSE)
Quiz
Time for some practice! Now having seen how we calculate the coverage ratios and frequency indices, calculate the prevalence score for the same sectors (e.g. Agriculture corresponds to HS 1-24, Natural Resources to HS 25-27, and Manufacturing to 28-97).
What is the correct answer?
A. PS_agro = 14.36, PS_NR = 4.14, PS_manu = 0.51
B. PS_agro = 16.36, PS_NR = 4.40, PS_manu = 3.99
C. PS_agro = 5.36, PS_NR = 4.14, PS_manu = 5.51
D. PS_agro = 11.36, PS_NR = 1.40, PS_manu = 0.99
# We do the same as in the above section but using the prevalence score formula
summary_data_VNM[1, "PS_agro"] <- sum(PS_ntm_data$value[PS_ntm_data$hs6<250000]*PS_ntm_data$count[PS_ntm_data$hs6<250000])/sum(PS_ntm_data$value[PS_ntm_data$hs6<250000])
summary_data_VNM[1, "PS_NR"] <- sum(PS_ntm_data$value[PS_ntm_data$hs6>=250000&PS_ntm_data$hs6<280000]*PS_ntm_data$count[PS_ntm_data$hs6>=250000&PS_ntm_data$hs6<280000])/sum(PS_ntm_data$value[PS_ntm_data$hs6>=250000&PS_ntm_data$hs6<280000])
summary_data_VNM[1, "PS_manu"] <- sum(PS_ntm_data$value[PS_ntm_data$hs6>=280000]*PS_ntm_data$count[PS_ntm_data$hs6>=280000])/sum(PS_ntm_data$value[PS_ntm_data$hs6>=280000])
#Therefore Answer D is the correct anser.