Estimating Ad-valorem Equivalents (AVEs) of Non-tariff Measures (NTMs)
1 Non-tariff Measures (NTMs): overview and data files
If you are not familiar with R, we highly recommend you take this short (free) self-paced online course on using R for analysis of NTMs and trade data here.
Typically, NTMs are more complex, less transparent and, due to their technical nature, are often more difficult to monitor and challenge than tariffs. To learn more about NTMs, we highly recommend this publication.
However, it is sometimes convenient to represent the costs on NTMs in units equivalent to ad-valorem tariffs. Furthermore, one advantage of the price-based AVE estimation method described here is that the AVEs can be immediately used in the GTAP model, without the need to adjust them based on import-demand elasticities, permitting assessment of the impacts of NTM harmonization policies at bilateral, plurilateral or regional level, among other scenarios aiming to study the effects of NTMs.
This online guide is designed to accompany New Price-based Bilateral Ad-valorem Equivalent Estimates of Non-tariff Measures to allow researchers to replicate the results as well as re-estimate AVEs under different assumptions or methods. 95% of work has been putting the data together. You’re welcome.
In the table below you can find the datasets we will use.
Description | File name | Link to download |
---|---|---|
HS6-GTAP concordance (commodity) [~71 KB] | HS12-gtap.csv | here |
GTAP-ISO3 concordance (country) [~5 KB] | gtap_iso_concordance.csv | here |
Detailed trade and NTMs data [~283,480 KB] | merged-11-April-2020.zip | here |
1.1 Data cleaning and regression
Import the packages we are going to need to use and set some global parameters.
#import packages
library("car")
library("stringi")
library("stringr")
library("dplyr")
library("data.table")
library("sandwich")
library("lmtest")
#set global parameters
options(warn=1)
options(scipen = 999) #removes scientific notation
Next, load the concordance file with HS six-digit codes and their corresponding GTAP sectors, HS chapters (two digits) and sections.
<- read.csv("HS12-gtap.csv", stringsAsFactors = FALSE)
com_con #note stringsAsFactors is FALSE by default in the latest R version but you may still need it if you use an older version
Unzip the “merged-11-April-2020.zip” files and put them into one directory (avoid putting other files there). Import all the HS six-digit csv data file names from the unzipped folder and store them in a new dataframe variable, hs_detailed
. This will help us access corresponding files in loops described later. dir()
can list all the file names in a given directory which will be specified by its argument, path
. pattern
is an optional argument that records file names which match the regular expression it receives. stringr::str_split_fixed(files, "[.]", 2)[,1]
slices all the file names into two parts and only retains the first part (six digits).
<- dir(path = "C:/Users/Alexi/OneDrive - United Nations/_R/aves/merged-11-April-2020", pattern = "\\.csv$")
files
#the path should specify where all the unzipped csv files are, e.g. '010121.csv','010129.csv',..
#if you get an error or an empy frame, your folder path is probably wrong
<- data.frame(hs6 = stringr::str_split_fixed(files, "[.]", 2)[,1])
hs_detailed
nrow(hs_detailed) #number of CSV files you should have in the unzipped folder
## [1] 5203
head(hs_detailed)
## hs6
## 1 010121
## 2 010129
## 3 010130
## 4 010190
## 5 010221
## 6 010229
Next, we add a leading zero to the HS six-digit codes incom_con
to make them all 6 digits (to match the leading zeros in the unzipped file names, otherwise codes without leading zeros e.g. chapter 9 and below get truncated). We then merge hs_detailed
with com_con
by hs6
. The result would be that each existing csv file (corresponding to individual HS six-digit code) will have a concordance to chapter, section and GTAP sector for later use.
$hs6 <- stri_pad_left(com_con$hs6,6,0)
com_con<- merge(hs_detailed, com_con, by="hs6", all.x = TRUE)
hs_detailed head(hs_detailed)
## hs6 section chapter gtap
## 1 010121 1 1 ctl
## 2 010129 1 1 ctl
## 3 010130 1 1 ctl
## 4 010190 1 1 ctl
## 5 010221 1 1 ctl
## 6 010229 1 1 ctl
We create two lists to store detailed interaction coefficients of technical measures and non-technical measures. We also create a new vector, chapters
, that stores all the unique values of chapter
column (HS two-digit chapters) of hs_detailed
. chapters
will be used to create a for loop in the following steps, i.e. the number of regressions run will be equal to the number of unique HS chapters (HS two-digit codes). Note you can instead estimate by section - just change the corresponding instances from chapter to section (or even GTAP sector straight away - but the GTAP sector sizes are less balanced in product groupings).
<- vector("list", nrow(hs_detailed))
hs_beta_list names(hs_beta_list) <- hs_detailed$hs6
<- vector("list", nrow(hs_detailed))
hs_beta_list_nt names(hs_beta_list_nt) <- hs_detailed$hs6
<- unique(hs_detailed$chapter)
chapters
# defining EU countries (2015 so pre-Brexit)
<- c('AUT', 'BEL', 'BGR', 'HRV', 'CYP', 'CZE', 'DNK', 'EST', 'FIN', 'FRA', 'DEU', 'GRC', 'HUN', 'IRL', 'ITA', 'LVA', 'LTU', 'LUX', 'MLT', 'NLD', 'POL', 'PRT', 'ROM', 'SVK', 'SVN', 'ESP', 'SWE', 'GBR') EUN
Before creating the for loop, a dataframe is created to store the summary statistics of the regressions. We also define two parameters here, which you are able to change according to your preferences/assumptions. n
denotes the percentage of derived prices (value/quantity) you want to truncate (top or/and bottom). both
denotes whether you want to remove the bottom n% of derived prices as well as top. Three are, unfortunately some real outliers that significantly affect the results if this step is skipped.
<- data.frame(matrix(vector(),0,12,
results_summary dimnames = list(c(),
c("chapter",
"ntm_count_t_b",
"ntm_count_nt_b",
"ntm_count_t_p",
"ntm_count_nt_p",
"ntm_count_t_int",
"ntm_count_t_int_sign",
"ntm_count_nt_int",
"ntm_count_nt_int_sign",
"n",
"r",
"adj")
)),stringsAsFactors=F)
<- 5; both = FALSE
n #feel free to experiment with these; do sensitivity analysis
Next, let’s load all the HS six digit data files in the folder and run regressions on them with a loop. Please note that this chunk may take more than an hour or so to run (depending on your system). If you have a particularly powerful machine with A LOT of RAM, you can try to merge all CSVs and run it as one regression. Let me know if you succeed I will be most interested. In this application they are run at chapter level (96 regressions).
for(s in 1:length(chapters)){
<- data.frame()
merged5
<- hs_detailed[hs_detailed$chapter==chapters[s],]
subsetx for(f in 1:nrow(subsetx))
{<- subsetx[f,"hs6"]
hs_code cat( "--- hs code: ",hs_code," ---\n");flush.console()
<- read.csv(paste0(hs_code,".csv"), stringsAsFactors = FALSE)
merged4
#remove Inf or NA prices
<- merged4[!is.na(merged4$price)&merged4$price<Inf,]
merged4
#truncate top (and bottom) n of prices (5%)
<- merged4[merged4$price<quantile(merged4$price,prob=1-n/100),]
merged4 if(both==TRUE){
#note if you you to truncate both min and max prices outliers, change the "both" parameter above to TRUE
<- merged4[merged4$price>quantile(merged4$price,prob=n/100),]
merged4
}
#token - unit, https://comtrade.un.org/data/MethodologyGuideforComtradePlus.pdf, table 3
if(length(unique(merged4$QtyToken))>1){
<- merged4 %>% group_by(QtyToken)%>% count()
sum_q
#only choose the rows with the most frequent quantity token
<- merged4[merged4$QtyToken==as.numeric(sum_q[which.max(sum_q$n),"QtyToken"]),]
merged4 if(length(unique(merged4$QtyToken))>1){
stop("Too many q tokens again")
}
}if(nrow(merged4)>0){
if(nrow(merged5)==0){
<- merged4
merged5 #lazy way to create a new dataframe for rowbinding, sorry
else{
}= nrow(merged5)+1
nextrow :(nextrow+nrow(merged4)-1),] = merged4
merged5[nextrow# we need to assure unique row names
row.names(merged5) = 1:nrow(merged5)
}
}
}
cat( "---CHAPTER: ",s," : RUNNING REGRESSION---\n");flush.console()
#Feel free to adjust here to estimate individual ICNTM chapters - this example is based on
#just groups of technical and non-technical measures
$ntm_count_t <- merged5$A + merged5$B + merged5$C
merged5$ntm_count_nt <- merged5$E + merged5$F + merged5$G + merged5$H + merged5$I
merged5
# removing intra-EU NTMs
$ReporterISO3 %in% EUN)&(merged5$PartnerISO3 %in% EUN),c("ntm_count_t", "ntm_count_nt")] <- 0
merged5[(merged5
<- lm(log(price) ~
reg1 log(dist) + contig + comlang_off + rta + llocked +
+ ntm_count_t +
applied +
ntm_count_nt factor(ProductCode)+
+
ReporterISO3
PartnerISO3+ ReporterISO3*ntm_count_t
+ ReporterISO3*ntm_count_nt
,data = merged5)
#this is individual regression summary, not to be confused with results_summary defined above
<- data.frame(summary(reg1)$coefficients[,c("Estimate","Pr(>|t|)")])
summary_results
#vcovCL: Clustered Covariance Matrix Estimation. To get robust errors.
#Please refer to https://rdrr.io/rforge/sandwich/man/vcovCL.html if you are interested.
<- coeftest(reg1, vcov = vcovCL, cluster = ~ dist)
clustered_errors <- data.frame(clustered_errors[1:nrow(summary_results),c(2,4)])
clustered_errors $coef <- rownames(clustered_errors)
clustered_errors
$coef <- rownames(summary_results)
summary_results<- merge(summary_results, clustered_errors, by="coef")
summary_results rownames(summary_results) <- summary_results$coef
<- summary_results %>% select(Estimate,`Pr...t...y` ) %>% rename(`Pr...t..` = `Pr...t...y`)
summary_results <- summary_results[rownames(data.frame(summary(reg1)$coefficients)),]
summary_results rm(clustered_errors)
#row.names() is more efficient than rownames()
<- row.names(summary(reg1)$coefficients)
rownames_1
#regular expression to find anything with ":" - those are interactions
<- which(rownames_1 %in% rownames_1[grepl("[:]", rownames_1)] )
rownames_1 <- summary_results[rownames_1,]
summary_results <- row.names(summary_results)
rownames_1 <- which(rownames_1 %in% rownames_1[grepl("ntm_count_t", rownames_1)] )
rownames_1
<- data.frame(summary_results[rownames_1, ])
summary_results_t
<- row.names(summary_results)
rownames_1 <- which(rownames_1 %in% rownames_1[grepl("ntm_count_nt", rownames_1)] )
rownames_1
<- data.frame(summary_results[rownames_1, ])
summary_results_nt
# format the list to store and then merge later on
<- summary_results_t
summary_results_t_for_list <- summary_results_t_for_list[!is.na(summary_results_t_for_list$Estimate),]
summary_results_t_for_list $reporter <- rownames(summary_results_t_for_list)
summary_results_t_for_list$reporter <- str_sub(summary_results_t_for_list$reporter,-3,-1)
summary_results_t_for_listrownames(summary_results_t_for_list) <- 1:nrow(summary_results_t_for_list)
<- summary_results_t_for_list %>% select(reporter, Estimate, `Pr...t..`) %>% rename(p=`Pr...t..`, beta_interaction = Estimate)
summary_results_t_for_list
<- summary_results_nt
summary_results_nt_for_list <- summary_results_nt_for_list[!is.na(summary_results_nt_for_list$Estimate),]
summary_results_nt_for_list $reporter <- rownames(summary_results_nt_for_list)
summary_results_nt_for_list$reporter <- str_sub(summary_results_nt_for_list$reporter,-3,-1)
summary_results_nt_for_listrownames(summary_results_nt_for_list) <- 1:nrow(summary_results_nt_for_list)
<- summary_results_nt_for_list %>% select(reporter, Estimate, `Pr...t..`) %>% rename(p=`Pr...t..`, beta_interaction = Estimate)
summary_results_nt_for_list
# store all the regression results in lists
for(j in 1:nrow(subsetx))
{<- subsetx[j,"hs6"]
hs_codej <- summary_results_t_for_list
hs_beta_list[[hs_codej]] <- summary_results_nt_for_list
hs_beta_list_nt[[hs_codej]]
}rm(summary_results_t_for_list, j)
"chapter" ] <- chapters[s]
results_summary[s,"ntm_count_t_b" ] <- summary(reg1)$coefficients[c("ntm_count_t"),"Estimate"]
results_summary[s,"ntm_count_t_p" ] <- summary(reg1)$coefficients[c("ntm_count_t"),"Pr(>|t|)"]
results_summary[s,
<- data.frame(summary(reg1)$coefficients[,c("Estimate","Pr(>|t|)")])
summary_results <- row.names(summary_results)
rownames_1 <- which(rownames_1 %in% "ntm_count_nt")
rownames_1
if (length(rownames_1)>0){
"ntm_count_nt_b" ] <- summary(reg1)$coefficients[c("ntm_count_nt"),"Estimate"]
results_summary[s,"ntm_count_nt_p" ] <- summary(reg1)$coefficients[c("ntm_count_nt"),"Pr(>|t|)"]
results_summary[s,
}
"ntm_count_t_int" ] <- nrow(summary_results_t)
results_summary[s,"ntm_count_t_int_sign" ] <- nrow(summary_results_t[summary_results_t[,2]<0.1&!is.na(summary_results_t[,2]),])
results_summary[s,"ntm_count_nt_int" ] <- nrow(summary_results_nt)
results_summary[s,"ntm_count_nt_int_sign" ] <- nrow(summary_results_nt[summary_results_nt[,2]<0.1&!is.na(summary_results_nt[,2]),])
results_summary[s,
"n" ] <- nrow(merged5)
results_summary[s,"r" ] <- summary(reg1)$r.squared
results_summary[s,"adj" ] <- summary(reg1)$adj.r.squared
results_summary[s,gc() # garbage collector - frees working memory
}
In the outer for loop, we create an empty dataframe, merged5
. Then, one chapter at a time, we create another for loop to load and rowbind all files with HS six digit codes in that chapter in the folder with HS six digit csv files with all the data on NTMs, trade flows, etc, truncate top and bottom n% of prices, keep the most frequent quantity token (unit), and store results in merged5
. We calculate the total number of technical measures and non-technical measures respectively and run the following regression:
The following table describes the variables refer to the paper for more details.
Variable Name | Description | Variable Name (Code) |
---|---|---|
\(\delta\) | Fixed effects of country \(i\) and \(j\) | ReporterISO3 , PartnerISO3 |
\(\rho\) | HS6 product fixed effects | factor(ProductCode) |
\(\theta\) | Pair characteristics (i.e. distance, contiguity, landlocked status, common language, and RTA) | log(dist) , contig , llocked , comlang_off , rta |
\(NTM^{Technical}_{ijk}\) | The count of separate non-tariff measures from chapters A to C imposed by \(i\) on imports of product \(k\) from \(j\) | ntm_count_t |
\(NTM^{Nontechnical}_{ijk}\) | The count of separate non-tariff measures from chapters D to O imposed by \(i\) on imports of product \(k\) from \(j\) | ntm_count_nt |
\(tariff_{ijk}\) | Applied tariff (MFN or lowest preferential, if available) imposed by \(i\) on imports of product \(k\) from \(j\) | applied |
We cluster errors based on dist
and store them in summary_results
. Then, we extract the coefficients of interaction items, divide the results into technical measures summary, summary_results_t
, and non-technical measures summary, summary_results_nt
, rename row names to country ISO3 codes, and rename columns Pr...t..
and Estimate
to p
and beta_interaction
respectively. Next, we store all the regression results in hs_beta_list
and hs_beta_list_nt
. In the end of each loop, we save all the information into the summary statistics dataframe we created before, results_summary
.
After the loop finishes running (hours), we save results_summary
to a csv file.
write.csv(results_summary,"./results_summary.csv", row.names=FALSE)
The results_summary data frame should correspond to output in Table A.4 of the journal article Annex.
1.2 Estimating AVEs using coefficients from the regressions
Let’s now proceed to estimating AVEs using the regression coefficients derived in the previous step. To begin with, we create an empty dataframe to store the results:
<- data.table(data.frame(matrix(vector(),0,8,
aves_file_final dimnames =
list(c(),
c("ReporterISO3", "PartnerISO3",
"aves_t","aves_nt", "TradeValue",
"ntm_count_t", "ntm_count_nt", "hs_code" ))),
stringsAsFactors=F))
We then create a for loop to calculate AVEs. Here, we will follow the following equations (see paper for full details):
\(AVE_{ijk}^{Technical}[total]=\tanh((\exp(\beta_4\times NTM^{Technical}_{ijk}+\beta_{6i} \times NTM^{Technical}_{ijk} )-1))\times100\)
\(AVE_{ijk}^{Nontechnical}[total]=\tanh((\exp(\beta_5 \times NTM^{Nontechnical}_{ijk}+\beta_{7i} \times NTM^{Nontechnical}_{ijk} )-1))\times100\)
Note that we restrict the highest AVEs to 100 through a hyperbolic function as exponential transformation can result in very large numbers - researchers can relax this assumption and set their own limits.
First, we read all the HS six digit files again and summarize all the technical measures and non-technical measures. We select columns ReporterISO3
, PartnerISO3
, TradeValue
, ntm_count_t
, and ntm_count_nt
. Next, we load \(\beta_4\) and \(\beta_5\), ntm_count_t_b
and ntm_count_nt_b
, and their \(p\) values, ntm_count_t_p
and ntm_count_nt_p
, from results_summary
. Before we calculate AVE estimates, we should review our datasets and set \(\beta\) values to 0 if their \(p\) values are over 0.1 (again researchers may wish to adjust these). Now we have all the variables in the equation. We calculate AVE estimates for technical and non-technical measures. The results are saved in aves_file_final
dataframe we created earlier.
for(f in 1:length(files))
{<- strsplit(files[f] , "[.]")[[1]][1]
hs_code cat( "--- hs code: ",hs_code," ---\n");flush.console()
<- read.csv(files[f], stringsAsFactors = FALSE)
merged4
if (nrow(merged4)==0){next()}
$ntm_count_t <- merged4$A + merged4$B + merged4$C
merged4$ntm_count_nt <- merged4$E + merged4$F + merged4$G + merged4$H + merged4$I
merged4
<- merged4 %>% select(ReporterISO3, PartnerISO3,TradeValue, ntm_count_t, ntm_count_nt)
merged4
#remove intra-EU tariffs
$ReporterISO3 %in% EUN)&(merged4$PartnerISO3 %in% EUN),c("ntm_count_t", "ntm_count_nt")] <- 0
merged4[(merged4
<- as.numeric(substr(hs_code, start = 1, stop = 2))
chapter
<- results_summary[results_summary$chapter==chapter,c("ntm_count_t_b")]
ntm_count_t_b <- results_summary[results_summary$chapter==chapter,c("ntm_count_t_p")]
ntm_count_t_p
<- results_summary[results_summary$chapter==chapter,c("ntm_count_nt_b")]
ntm_count_nt_b <- results_summary[results_summary$chapter==chapter,c("ntm_count_nt_p")]
ntm_count_nt_p
if(length(ntm_count_nt_p)==0){next()}
#Why don't we test the length of ntm_count_t_p?
#If a country does not have technical measures, it will not have non-technical measures and will not be recorded here,
if(ntm_count_t_p>0.1){ntm_count_t_b <- 0}
if(ntm_count_nt_p>0.1){ntm_count_nt_b <- 0}
rm(ntm_count_t_p, ntm_count_nt_p)
<- merge(merged4, hs_beta_list[[hs_code]], by.x="ReporterISO3", by.y="reporter", all.x = TRUE)
merged4 if (nrow(merged4)==0){next()}
!is.na(merged4$p)&merged4$p>0.1, "beta_interaction"] <- 0
merged4[<- merged4 %>% rename(beta_interaction_t = beta_interaction) %>% select(-c(p))
merged4
<- merge(merged4, hs_beta_list_nt[[hs_code]], by.x="ReporterISO3", by.y="reporter", all.x = TRUE)
merged4
if (nrow(merged4)==0){next()}
!is.na(merged4$p)&merged4$p>0.1, "beta_interaction"] <- 0
merged4[<- merged4 %>% rename(beta_interaction_nt = beta_interaction) %>% select(-c(p))
merged4
$aves_t <- tanh(exp(ntm_count_t_b*merged4$ntm_count_t + merged4$beta_interaction_t*merged4$ntm_count_t) - 1)*100
merged4$aves_t[merged4$aves_t<0] <- 0
merged4
$aves_nt <- tanh(exp(ntm_count_nt_b*merged4$ntm_count_nt + merged4$beta_interaction_nt*merged4$ntm_count_nt) - 1)*100
merged4$aves_nt[merged4$aves_nt<0] <- 0
merged4
<- merged4 %>% select(ReporterISO3, PartnerISO3, aves_t, aves_nt, TradeValue, ntm_count_t, ntm_count_nt)
merged4
$hs_code <- hs_code
merged4
= list(aves_file_final,merged4)
l <- rbindlist(l)
aves_file_final }
To get the global averages, run:
%>%
aves_file_final summarise(ave_t_weighted = weighted.mean(aves_t, TradeValue, na.rm = TRUE),
ave_nt_weighted = weighted.mean(aves_nt, TradeValue, na.rm = TRUE)
)
## ave_t_weighted ave_nt_weighted
## 1 7.179849 6.062699
You can visualize the distributions using ggplot
function:
#to reshape the data file into a long format for use in ggplot next:
library("reshape2")
<- melt(aves_file_final[,c("ReporterISO3", "PartnerISO3", "hs_code","aves_t", "aves_nt")], id.vars=c("ReporterISO3", "PartnerISO3", "hs_code"))
long
library("ggplot2")
ggplot(long, aes(x=value, color=variable )) +
geom_histogram(alpha=0.5, position="identity")+scale_y_log10()
Note I used log transformation for y-axis here as the vast majority of AVEs are quite close to zero.
1.3 AVE estimates for GTAP sectors and regions
Now let’s adjust the AVE database we calculated to make it usable in the GTAP database. We should already have the hs six-digit to sector concordance loaded (hs_detailed
). First we merge the ave dataset with gtap sector codes.
#making sure both are numeric, otherwise trailing zeros sometimes cause issues
$hs_code <- as.numeric(aves_file_final$hs_code)
aves_file_final$hs6 <- as.numeric(hs_detailed$hs6)
hs_detailed
<- merge(aves_file_final, hs_detailed[,c("hs6","gtap")], by.x="hs_code", by.y="hs6") %>% rename(sector=gtap)
gtap_aves head(gtap_aves)
## hs_code ReporterISO3 PartnerISO3 aves_t aves_nt TradeValue ntm_count_t
## 1: 10121 ARE AUS NA NA 133.609220 7
## 2: 10121 ARE URY NA NA 741.354996 7
## 3: 10121 ARE BEL NA NA 8.509452 7
## 4: 10121 ARE DEU NA NA 1434.923757 7
## 5: 10121 ARE ESP NA NA 22.443169 7
## 6: 10121 ARE ZAF NA NA 250.453571 8
## ntm_count_nt sector
## 1: 3 ctl
## 2: 3 ctl
## 3: 3 ctl
## 4: 3 ctl
## 5: 3 ctl
## 6: 3 ctl
Next we load the GTAP region to ISO3 concordance, and merge it for both partners and reporters (and rename them accordingly).
<- read.csv("gtap_iso_concordance.csv", stringsAsFactors = FALSE)
reg_con head(reg_con)
## iso3 gtap_r country
## 1 AFG xsa Afghanistan
## 2 ALB alb Albania
## 3 DZA xnf Algeria
## 4 ASM xoc American Samoa
## 5 AND xer Andorra
## 6 AGO xac Angola
#note you may see some double entries for countries with legacy ISO3 codes, like Romania (ROU & ROM) but this does not affect the subsequent aggregations - only affects partner trade in some cases.
<- merge(gtap_aves, reg_con[,c("iso3","gtap_r")], by.x="ReporterISO3", by.y="iso3", all.x = TRUE) %>% rename(reporter = gtap_r)
gtap_aves <- merge(gtap_aves, reg_con[,c("iso3","gtap_r")], by.x="PartnerISO3", by.y="iso3", all.x = TRUE) %>% rename(partner = gtap_r)
gtap_aves head(gtap_aves)
## PartnerISO3 ReporterISO3 hs_code aves_t aves_nt TradeValue ntm_count_t
## 1: ABW ARE 392329 NA NA 0.0272294 0
## 2: ABW ATG 950699 NA NA 1.1850000 0
## 3: ABW AUT 852349 7.471075 0.00000 0.8062939 5
## 4: ABW AUT 901839 0.000000 23.97609 1.8876370 12
## 5: ABW BEL 271019 0.000000 0.00000 0.0940000 3
## 6: ABW BEL 681599 0.000000 0.00000 0.0030000 2
## ntm_count_nt sector reporter partner
## 1: 0 crp are xcb
## 2: 0 omf xcb xcb
## 3: 1 crp aut xcb
## 4: 2 ome aut xcb
## 5: 4 p_c bel xcb
## 6: 0 nmm bel xcb
We finally summarize/group trade-weighted means of all indicators by GTAP sectors and regions:
<- gtap_aves %>% group_by(reporter, partner, sector) %>%
aves_file_final_gtap summarise(ave_t_weighted = weighted.mean(aves_t, TradeValue, na.rm = TRUE),
ave_nt_weighted = weighted.mean(aves_nt, TradeValue, na.rm = TRUE),
t_count_weighted = weighted.mean( ntm_count_t, TradeValue, na.rm = TRUE),
nt_count_weighted = weighted.mean(ntm_count_nt, TradeValue, na.rm = TRUE),
TradeValue=sum(TradeValue, na.rm = TRUE)
) aves_file_final_gtap
## # A tibble: 218,788 × 8
## # Groups: reporter, partner [12,769]
## reporter partner sector ave_t_weighted ave_nt_weigh…¹ t_cou…² nt_co…³ Trade…⁴
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 are alb crp NaN NaN 2.18 1.01 4.04e+0
## 2 are alb fmp NaN NaN 0 0 1.46e+2
## 3 are alb i_s NaN NaN 0 0 1.04e+3
## 4 are alb lea NaN NaN 4.73 0 9.57e+2
## 5 are alb lum NaN NaN 0.0277 0.0830 2.30e+1
## 6 are alb mvh NaN NaN 1 3 2.20e+0
## 7 are alb nmm NaN NaN 0 0 6.59e-1
## 8 are alb ocr NaN NaN 28 6 1.59e+1
## 9 are alb ome NaN NaN 0.954 2.86 1.20e+1
## 10 are alb omf NaN NaN 1.89 0.130 3.15e+1
## # … with 218,778 more rows, and abbreviated variable names ¹ave_nt_weighted,
## # ²t_count_weighted, ³nt_count_weighted, ⁴TradeValue
1.4 Filling the gaps
As discussed in the paper, for various reasons there are data gaps (lack of NTM data for countriess, small sample in bilateral trade flows, multicollinearity, etc). Researchers may still wish to fill in these gaps for use in GTAP. This is one such proposed ways. You’d need full matrix of GTAP regions and sectors and create a full data matrix to start with.
Description | File name | Link to download |
---|---|---|
GTAP sector/region level bilateral trade from UN COMTRADE (2015 data) [~6,247 KB] | Totalbilater_gtap_2020.csv | here |
All GTAP commodities [~1 KB] | set_all_commodities.csv | here |
All GTAP regions [~2KB] | set_all_regions.csv | here |
Note that in sector/region file there is the most basic concordance between ISO3 codes/HS codes and corresponding GTAP sectors. You may wish to edit these yourselves already if you have a particular aggregation in mind (or follow the steps described in the online annex of the paper)
<- read.csv("Totalbilater_gtap_2020.csv", stringsAsFactors = FALSE)
Totalbilater_gtap #This file contains total bilater trade at gtap sector/region level for 2015
#The column "TotalTrade" is different from the original "TradeValue" as "TradeValue" only includes trade values used in regressions and as noted some were omitted for various reasons
#option to change custom aggregation within these files directly
<- read.csv("set_all_commodities.csv", stringsAsFactors = FALSE)
set_com <- read.csv("set_all_regions.csv", stringsAsFactors = FALSE)
set_reg
<- merge(aves_file_final_gtap, Totalbilater_gtap, by=c("reporter" ,"partner", "sector"), all.x = TRUE)
aves_file_final_gtap
<- data.frame(unique(set_reg$new))
full_matrix names(full_matrix) <- "reporter_new"
<- merge(full_matrix,data.frame(unique(set_reg$new)), all=TRUE)
full_matrix names(full_matrix)[2] <- "partner_new"
<- merge(full_matrix,data.frame(unique(set_com$new)), all=TRUE)
full_matrix names(full_matrix)[3] <- "sector_new"
$index <- 1:nrow(full_matrix)
full_matrixnrow(full_matrix)
#1117200
length(unique(set_reg$new))*length(unique(set_com$new))*length(unique(set_reg$new))
#1117200 - full GTAP sector x region x region (now empty) matrix
#now merging with iso3 concordances
<- merge(aves_file_final_gtap, set_reg, by.x = "reporter", by.y = "old", all.x = TRUE)
aves_file_final_gtap <- aves_file_final_gtap %>% dplyr::rename(reporter_new = new)
aves_file_final_gtap
<- merge(aves_file_final_gtap, set_reg, by.x = "partner", by.y = "old", all.x = TRUE)
aves_file_final_gtap <- aves_file_final_gtap %>% dplyr::rename(partner_new = new)
aves_file_final_gtap
<- merge(aves_file_final_gtap, set_com, by.x = "sector", by.y = "old", all.x = TRUE)
aves_file_final_gtap <- aves_file_final_gtap %>% dplyr::rename(sector_new = new)
aves_file_final_gtap
<- merge(full_matrix, aves_file_final_gtap,
full_matrix by=c("reporter_new", "partner_new", "sector_new"), all.x = TRUE)
#again you could wish to edit edit sector/region files straight away (set_reg, set_com) earlier to get own region/sector aggregation for GTAP - hence "partner, reporter, sector _new" (or use the ready file in the online annex), this will just give you a full matrix (with lots of missing values - not all GTAP sectors/regions have NTM data, regressions don't run to sample sizes, etc)
<- full_matrix %>% dplyr::group_by(reporter_new, partner_new, sector_new) %>%
aves_file_final_gtap ::summarise(ave_t_weighted = weighted.mean(ave_t_weighted, TradeValue, na.rm = TRUE),
dplyrave_nt_weighted = weighted.mean(ave_nt_weighted, TradeValue, na.rm = TRUE),
t_count_weighted = weighted.mean(t_count_weighted, TradeValue, na.rm = TRUE),
nt_count_weighted = weighted.mean(nt_count_weighted, TradeValue, na.rm = TRUE),
TradeValue=sum(TradeValue, na.rm = TRUE),
TotalTrade=sum(TotalTrade, na.rm = TRUE))
nrow(aves_file_final_gtap)
#1117200
# if you changed set_reg; set_com files you'd have a different full matrix dimension
$share <- aves_file_final_gtap$TradeValue/aves_file_final_gtap$TotalTrade*100
aves_file_final_gtap#as described in the paper you may wish to remove some NTMs where in some sectors they only cover a small part of total trade share, I personally set it at 30%, but feel free to make your own assumptions (e.g. GTAP services sectors may have one or two HS codes linked to them in concordance, large overall TotalTrade flows, but few would have been have been used in NTM AVE aggregation for the whole GTAP sector)
<- aves_file_final_gtap
gtap_imports_final #personally I like backup datasets along the way in case I mess up and need to re-run the whole thing, feel free to adjust
The final part is using auxilary regressions to see estimate missing values for which NTM data exisits for some reason (either small sample size at HS codes or collinearity) inhibited their estimation. Researchers are encouraged to examine/adjust assumptions according to their needs
library(lmtest)
library(sandwich)
library(stringr)
#technical measures
<- lm(ave_t_weighted ~ t_count_weighted + t_count_weighted:reporter_new +
reg1 :sector_new-1, data = gtap_imports_final)#supressing intercept
t_count_weighted
<- data.frame(summary(reg1)$coefficients[,c("Estimate","Pr(>|t|)")])
summary_results
<- coeftest(reg1, vcov = vcovHC(reg1, type="HC1"))
clustered_errors <- data.frame(clustered_errors[1:nrow(summary_results),c(2,4)])
clustered_errors $coef <- rownames(clustered_errors)
clustered_errors
$coef <- rownames(summary_results)
summary_results<- merge(summary_results, clustered_errors, by="coef")
summary_results rownames(summary_results) <- summary_results$coef
<- summary_results %>% select(Estimate,`Pr...t...y` ) %>% dplyr::rename(`Pr...t..` = `Pr...t...y`)
summary_results
<- summary_results[rownames(data.frame(summary(reg1)$coefficients)),]
summary_results #rm(clustered_errors)
<- row.names(summary(reg1)$coefficients)
rownames_1 <- which(rownames_1 %in% rownames_1[grepl("[:]", rownames_1)] )
rownames_1 <- summary_results[rownames_1,]
summary_results
<- row.names(summary_results)
rownames_1 <- which(rownames_1 %in% rownames_1[grepl("reporter_new", rownames_1)] )
rownames_reporter
<- data.frame(summary_results[rownames_reporter, ])
summary_reporter_new $reporter_new <- str_sub(row.names(summary_reporter_new), start= -3)
summary_reporter_new
<- which(rownames_1 %in% rownames_1[grepl("sector_new", rownames_1)] )
rownames_sector
<- data.frame(summary_results[rownames_sector, ])
summary_sector_new $sector_new <- str_sub(row.names(summary_sector_new), start= -3)
summary_sector_new
#may choose to adjust the level of significance
<- 0.01
sign
<- summary_reporter_new[summary_reporter_new$`Pr...t..` < sign , c("reporter_new", "Estimate")]
summary_reporter_new <- summary_sector_new[summary_sector_new$`Pr...t..` < sign , c("sector_new", "Estimate")]
summary_sector_new
<- merge(gtap_imports_final, summary_reporter_new, by="reporter_new", all.x=TRUE)
gtap_imports_final_full <- merge(gtap_imports_final_full, summary_sector_new , by="sector_new", all.x=TRUE)
gtap_imports_final_full
<- gtap_imports_final_full %>% dplyr::rename(reporter_coef = Estimate.x, sector_coef = Estimate.y)
gtap_imports_final_full
summary(reg1)$coefficients["t_count_weighted", "Estimate"]
$ave_t_weighted01 <-
gtap_imports_final_fullI(summary(reg1)$coefficients["t_count_weighted", "Estimate"]*gtap_imports_final_full$t_count_weighted) +
I(gtap_imports_final_full$reporter_coef*gtap_imports_final_full$t_count_weighted) +
I(gtap_imports_final_full$sector_coef*gtap_imports_final_full$t_count_weighted)
# adding estimates
# creating an indicator of whether it was estimated
$t_estimated <- 0
gtap_imports_final_full
$t_estimated[is.nan(gtap_imports_final_full$ave_t_weighted)&
gtap_imports_final_full!is.nan(gtap_imports_final_full$t_count_weighted)&
!is.nan(gtap_imports_final_full$ave_t_weighted01)] <- 1
$t_estimated[is.nan(gtap_imports_final_full$ave_t_weighted)&
gtap_imports_final_full!is.nan(gtap_imports_final_full$t_count_weighted)&
!is.nan(gtap_imports_final_full$ave_t_weighted01)] <- 1
#replacing the NA parts with estimates
$ave_t_weighted[is.na(gtap_imports_final_full$ave_t_weighted)&
gtap_imports_final_full!is.na(gtap_imports_final_full$t_count_weighted)] <-
$ave_t_weighted01[is.na(gtap_imports_final_full$ave_t_weighted)&!is.na(gtap_imports_final_full$t_count_weighted)]
gtap_imports_final_full
#non-technical measures
<- lm(ave_nt_weighted ~ nt_count_weighted + nt_count_weighted:reporter_new +
reg1 :sector_new-1, data = gtap_imports_final)#supressing intercept
nt_count_weighted
<- data.frame(summary(reg1)$coefficients[,c("Estimate","Pr(>|t|)")])
summary_results
<- coeftest(reg1, vcov = vcovHC(reg1, type="HC1"))
clustered_errors <- data.frame(clustered_errors[1:nrow(summary_results),c(2,4)])
clustered_errors $coef <- rownames(clustered_errors)
clustered_errors
$coef <- rownames(summary_results)
summary_results<- merge(summary_results, clustered_errors, by="coef")
summary_results rownames(summary_results) <- summary_results$coef
<- summary_results %>% select(Estimate,`Pr...t...y` ) %>% dplyr::rename(`Pr...t..` = `Pr...t...y`)
summary_results <- summary_results[rownames(data.frame(summary(reg1)$coefficients)),]
summary_results
<- row.names(summary(reg1)$coefficients)
rownames_1 <- which(rownames_1 %in% rownames_1[grepl("[:]", rownames_1)] )
rownames_1 <- summary_results[rownames_1,]
summary_results
<- row.names(summary_results)
rownames_1 <- which(rownames_1 %in% rownames_1[grepl("reporter_new", rownames_1)] )
rownames_reporter
<- data.frame(summary_results[rownames_reporter, ])
summary_reporter_new $reporter_new <- str_sub(row.names(summary_reporter_new), start= -3)
summary_reporter_new
<- which(rownames_1 %in% rownames_1[grepl("sector_new", rownames_1)] )
rownames_sector
<- data.frame(summary_results[rownames_sector, ])
summary_sector_new $sector_new <- str_sub(row.names(summary_sector_new), start= -3)
summary_sector_new
#may choose to adjust the level of significance
<- 0.01
sign
<- summary_reporter_new[summary_reporter_new$`Pr...t..` < sign , c("reporter_new", "Estimate")]
summary_reporter_new <- summary_sector_new[summary_sector_new$`Pr...t..` < sign , c("sector_new", "Estimate")]
summary_sector_new
<- merge(gtap_imports_final, summary_reporter_new, by="reporter_new", all.x=TRUE)
gtap_imports_final_full2 <- merge(gtap_imports_final_full2, summary_sector_new , by="sector_new", all.x=TRUE)
gtap_imports_final_full2
<- gtap_imports_final_full2 %>% dplyr::rename(reporter_coef = Estimate.x, sector_coef = Estimate.y)
gtap_imports_final_full2
$ave_nt_weighted01 <-
gtap_imports_final_full2I(summary(reg1)$coefficients["nt_count_weighted", "Estimate"]*gtap_imports_final_full2$nt_count_weighted) +
I(gtap_imports_final_full2$reporter_coef*gtap_imports_final_full2$nt_count_weighted) +
I(gtap_imports_final_full2$sector_coef*gtap_imports_final_full2$nt_count_weighted)
# creating an indicator of whether it was estimated
$nt_estimated <- 0
gtap_imports_final_full2$nt_estimated[is.na(gtap_imports_final_full2$ave_nt_weighted)&
gtap_imports_final_full2!is.na(gtap_imports_final_full2$nt_count_weighted)&
!is.na(gtap_imports_final_full2$ave_nt_weighted01)] <- 1
sum(gtap_imports_final_full2$nt_estimated)
#14373 additional non technical measures estimated
sum(gtap_imports_final_full$t_estimated)
#26649 additional technical measures estimated
# creating an indicator of whether it was estimated
$ave_nt_weighted[is.na(gtap_imports_final_full2$ave_nt_weighted)&!is.na(gtap_imports_final_full2$nt_count_weighted)] <-
gtap_imports_final_full2$ave_nt_weighted01[is.na(gtap_imports_final_full2$ave_nt_weighted)&!is.na(gtap_imports_final_full2$nt_count_weighted)]
gtap_imports_final_full2
# ----------------------- Packing it all up
# not the most efficient way, sorry, suggestions welcome
<- gtap_imports_final[order(gtap_imports_final$reporter_new,
gtap_imports_final $partner_new,
gtap_imports_final$sector_new),]
gtap_imports_final
<- gtap_imports_final_full[order(gtap_imports_final_full$reporter_new,
gtap_imports_final_full $partner_new,
gtap_imports_final_full$sector_new),]
gtap_imports_final_full
<- gtap_imports_final_full2[order(gtap_imports_final_full2$reporter_new, gtap_imports_final_full2$partner_new,
gtap_imports_final_full2 $sector_new),]
gtap_imports_final_full2
<- gtap_imports_final
gtap_imports_final0
$ave_t_weighted <- gtap_imports_final_full$ave_t_weighted
gtap_imports_final$ave_nt_weighted <- gtap_imports_final_full2$ave_nt_weighted
gtap_imports_final$t_estimated <- gtap_imports_final_full$t_estimated
gtap_imports_final$nt_estimated <- gtap_imports_final_full2$nt_estimated
gtap_imports_final
#just double checking, should match as before
sum(gtap_imports_final$t_estimated)
#26649
sum(gtap_imports_final$nt_estimated)
#14373
#removing extreme values close to 100 - hyperbolic function again, feel free to adjust if desired for a higher value
$ave_nt_weighted[gtap_imports_final$nt_estimated==1] <-
gtap_imports_final100*(tanh(gtap_imports_final$ave_nt_weighted[gtap_imports_final$nt_estimated==1]/100))
#replacing negative values with zeros:
$ave_t_weighted[gtap_imports_final$t_estimated==1>ap_imports_final$ave_t_weighted<0] <-0
gtap_imports_final$ave_nt_weighted[gtap_imports_final$nt_estimated==1>ap_imports_final$ave_nt_weighted<0] <-0
gtap_imports_final
write.csv(gtap_imports_final, "gtap_imports_final.csv", row.names = FALSE, na = "")
gtap_imports_final
## # A tibble: 1,117,200 × 11
## reporter_new partne…¹ secto…² ave_t…³ ave_n…⁴ t_cou…⁵ nt_co…⁶ Trade…⁷ Total…⁸
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 alb alb atp NA NA NA NA 0 0
## 2 alb alb b_t NA NA NA NA 0 0
## 3 alb alb c_b NA NA NA NA 0 0
## 4 alb alb cmn NA NA NA NA 0 0
## 5 alb alb cmt NA NA NA NA 0 0
## 6 alb alb cns NA NA NA NA 0 0
## 7 alb alb coa NA NA NA NA 0 0
## 8 alb alb crp NA NA NA NA 0 0
## 9 alb alb ctl NA NA NA NA 0 0
## 10 alb alb dwe NA NA NA NA 0 0
## # … with 1,117,190 more rows, 2 more variables: t_estimated <int>,
## # nt_estimated <int>, and abbreviated variable names ¹partner_new,
## # ²sector_new, ³ave_t_weighted, ⁴ave_nt_weighted, ⁵t_count_weighted,
## # ⁶nt_count_weighted, ⁷TradeValue, ⁸TotalTrade
hist(gtap_imports_final$ave_t_weighted, main = "Technical Measures")
hist(gtap_imports_final$ave_nt_weighted, main = "Non-technical Measures")
Congrats on making it this far! Any questions/suggestions/comments - please email kravchenkoa [at] un [dot] org