Introduction

This training is titled R for linking non-tariff measures to the Sustainable Development Goals and is second in the two-module E-learning series on R for trade and trade policy analysis published by the Trade, Investment and Innovation Division of the Economic and Social Commission for Asia and the Pacific (TIID ESCAP).

Non-Tariff Measures (NTMs) are policy measures - other than ordinary customs tariffs - that can potentially have an economic effect on international trade in goods, changing quantities traded, or prices or both.1 By definition they are not inherently protectionist, as while they can impose additional costs on the movement of goods, they also purport to address specific and legitimate non-trade objectives, some of which are relevant to the achievement of thе long-standing internationally agreed upon development priorities, which were solidified in the Sustainable Development Goals (SDGs) in 2015.2

In an effort to understand and describe the linkages that exist between NTMs imposed on internationally traded goods and the SDGs, ESCAP and UNCTAD jointly developed an SDG-HS-NTM concordance matrix based on their methodology outlined in the ESCAP working paper ‘Exploring linkages between non-tariff measures and the Sustainable Development Goals: A global concordance matrix and application to Asia and the Pacific’.3 The methodology and the excel file containing the concordance matrix are available here.

The matrix can be used to conduct analysis of global, regional or country-level data on non-tariff measures collected and stored in the UNCTAD’s global database on NTMs (TRAINS) available at https://trains.unctad.org/. The database contains information on more than 65,000 regulations imposed by over 100 countries. The data was collected following UNCTAD Guidelines to collect data on official non-tariff measures 2016 4 and classified in accordance with the International Classification of Non-Tariff Measures (ICNTM;5 version of 2012),6 developed by the Multi-Agency Support Team (MAST) Group.7

You can download an Excel file with a complete NTM data set dated as of May 2019 from here. An example of such analysis was conducted by ESCAP and its results were presented in Asia-Pacific Trade and Investment Report 20198 and in the accompanying country briefs for Armenia, Azerbaijan and Tajikistan available from here(see bottom of the webpage).

This training module describes how to use R to transform the SDG-HS-NTM concordance matrix into a tool to conduct such an analysis of NTM data from TRAINS database of NTMs.

In particular, it describes the following:

  • what data is used to conduct the analysis, where to download it from and how to import it into R;
  • how to extract necessary data from the imported data frames;
  • how to prepare the data for analysis by putting it into the correct format, removing redundant data, unnecessary symbols or white spaces, filtering, arranging and replacing the data as necessary;
  • what code algorithm is necessary to conduct the matching of data in TRAINS database to the concordance strings in the matching matrix and how to record, save and export the results of such matching;
  • how to review and interpret the results of the matching, and to generate illustrative graphs in R or in Excel.

The material contains the R code, explanatory notes, preview of interim results and links to data sources. Complete code algorithm that you can open in RStudio and run with or without further editing is available in a separate script file (see Data section).

This training module also contains quizzes and practice sessions that intend to highlight a few useful things and to help some of the new knowledge to sink in better.

Note that this training module is part of an e-learning series and is provided together with the training module titled Training on using R for trade analysis, which is available in English and in Russian languages. Training on using R for trade analysis is provided together with a rated quiz, satisfactory passing of which entitles students to a certificate of completion. On the contrary, this module R for linking non-tariff measures to the Sustainable Development Goals does not have any rated quiz. So read on and practice at your own pace and for your own pleasure!

Data

All necessary data are described in the table below.9 You can download them into the directory on your PC that you will use as your working directory in RStudio for this training module.

What Training file name Original source
UNCTAD’s TRAINS database on NTMs UNCTAD_i-tip_report_ALL_Measures_x.xlsx here
SDG-HS-NTM concordance matrix _SDG-HS-NTM-Concordance[v1.1 Sep 2019]0.xlsx here
UN Comtrade Commodity Classification HSCodesAll.csv here
Table of concordance between different versions of the Harmonized Commodity Description and Coding System HS_concordance_full.csv here
International Classification of non-tariff measures MAST_all.csv here
ISO 3 country codes and binary data on whether a country has NTM data recorded in TRAINS and on whether it belongs to a certain grouping countries_UNCTAD_iso3.csv here
Complete code algorithm complete_matching_code.R here
Files with matching results per country countries.zip here

First two are the main data files that are used to analyze non-tariff measures and their linkages to the SDGs. Open these two Excel files to familiarize yourself with the content and the setup of these two data sets.

UNCTAD’s TRAINS database contains the following information for each recorded NTM: imposing country, partner affected, NTM code (as per ICNTM 2012), measure description, affected HS codes, description of affected goods, source and national legal basis. In steps below, we will combine four columns titled measure description, description of affected goods, source and national legal basis into one column titled description.

SDG-HS-NTM concordance matrix contains the following information for each concordance string: SDG, SDG Targets, product group description, description of HS code list, list of HS codes, HS version, NTM code (ICNTM), list of keywords, list of negative keywords, and additional notes. You will see that some rows are colored grey and have 0 in column CODE 0/1. These are concordance strings that describe potential relationships that exists between regulating trade in certain goods and achievement of certain SDGs. Due to the setup of the TRAINS database of NTMs, these concordance strings cannot be used in the analysis below. In steps below we will filter these rows out. Filtering will leave us with the concordance strings that describe intended direct (and positive) impact on the achievement of SDGs.

A specific HS-NTM code pair was considered to have a direct linkage to an SDG Target if (1) it has a clearly stated SDG Target-related objective (supported by relevant keywords in the descriptive information present in the TRAINS database), or (2) the examined HS-NTM code combination is not likely to have any objective other than the one that is relevant to an SDG (e.g. trade in hazardous chemicals and waste, endangered species of flora and fauna, cultural heritage items, arms and weapons, etc.).10

The code algorithm, described in detail below, eventually allows to check each NTM entry in the TRAINS database for presence of simultaneous matches with at least one HS code, one NTM code and one keyword described in a given concordance string in SDG-HS-NTM concordance matrix.

The rest of the data frames are used to filter, clean, convert, reorganize and otherwise prepare the data in the two main data frames for subsequent analysis. Further details on these data sets are provided in sections below.

complete_matching_code.R contains the complete code algorithm, and countries.zip has a set of 88 files that contain results of matching HS and NTM codes to SDGs (more on these two in sections below).

Downloading R and RStudio

R is a free software environment for statistical computing and graphics, which runs on a variety of platforms, while RStudio, that we use for the purpose of this training, is an open source software that provides a very convenient console for inputting and running the code, previewing the generated outputs and saving them in different formats.

Find download instructions for PC and Mac in Annex 1 below, or refer to Training on using R for trade analysis for more detailed instructions.

Code

Now we will go through the code step by step, provide necessary explanations and show the interim results. Should you need more information on various packages and functions run help(package = "package_name") for a given package or ?function_name for a given function. Help information will be displayed in the bottom right corner of your RStudio window in tab Help.

The full code algorithm is available in “complete_matching_code.R” file (see Data section). You can open it in your RStudio or is a .txt file. You can edit it or run it as is, having only revised the first line of code setting the working directory (we will explain how to do that below). However, do not rush to use this file. First go through the steps of this training module and familiarize yourself with what each line of code intends to do.

General suggestions and recommendations

Going through this training module may take 1-2 weeks depending on your learning pace and on your familiarity with R code. Doing the training module titled Training on using R for trade analysis first, should make going through this learning module easier and faster. Anyhow the process will take a few separate sessions.

To make sure that no progress is lost, follow the following recommendations:

  • When you start this training, open, name and save a new script in RStudio. As you go through the module it is necessary to paste and run all code from the code chunks in this module, both for training, and for quizzes and training sessions, in one script. You can use # sign to insert comments to easily differentiate between training code and quiz code. Important: Code chunks are page-wide grey boxes that contain the code strings that are described in this module. Code in small grey boxes within the paragraphs of text are to be examined, but should not be copied into your R script. Page-wide white boxes with text preceded by ## contain the output, that is the result of running the code string just above it - this is what you should see when you run the code in your RStudio.

  • Execution of any chunk of the training code, quiz code and some practice code is dependent on prior execution of all earlier chunks of code (including those that set the working directory and load all function packages and data files). So, as you progress through this training module, quizzes and practice sessions, make sure to copy all code into your R script, run it and save it regularly. To save the script press Ctrl+S at any convenient time. To save all the objects with variables and data frames (the workspace) make sure to press “Save” in the pop-up window asking whether you want to save the workspace image, which appears when you close the RStudio window. Next time you open the session you can continue from where you stopped.

  • If your start getting errors after restarting your session, just run all the code from all previous study sessions, and then try to run the code from the new session once again. To launch execution of multiple lines of code at once, select them all, hit Ctrl+Enter and wait for a few moments. To run one string of code place a cursor inside that string and press Ctrl+Enter.

  • If you previously ran the code that had a mistake and that overwrote an existing object or its elements (such code string would contain <- or =) that are used in the following lines of code, running those following lines of code may result in output mistakes or error messages. In this case, after you revised the faulty code, it is better to retrace your steps to where the objects involved were originally created, and rerun that code and all code that follows. Alternatively, in some cases it may be feasible to rerun all code from the very beginning.

  • Warning message that may be shown, usually do not stop execution of code. They are intended to draw your attention to some characteristics of code execution. It still may be useful to attention to them.

  • Error messages state that something went wrong and stop execution of code. The text of the error message indicates what type of error that may be: mistake in the code, absence of certain data objects in the workspace, error in a link to a location on your PC, wrong working directory set in earlier steps, failure to install and/or load the function packages. Error message has to be reviewed and the issues resolved, before you can move forward. In some cases updating R and all function packages, or even uninstalling and reinstalling of R and RStudio, may help, as the software is constantly updated by the developers. At the time of publishing of this training module, the current code worked well for the tasks described.

  • Should you need more information on various packages and functions run help(package = “package_name”) for a given package or ?function_name for a given function. If you run into a function that was introduced earlier in a module, you can search in the module web-page by pressing Ctrl+S and entering the name of the function. This way you will find its earlier mentioning and example if use to refresh your recollection.

  • Notice links to footnotes and external web-resources included throughout the training module. Those may contain additional short clarifications or provide additional useful reference material on NTMs, SDGs and their linkages, as well as on R language and functions.

Quizzes and practice

As you go through the code and instructions below, you will be offered a few practice sessions as well as a few quizzes that intend to help you get better grasp of the workings of R and its code. You are not rated on these quizzes. They are only there for you to learn.

1 Preparatory steps

1.1 Start RStudio, set work directory and install code packages

Start your RStudio, and open your new R script by selecting File/New file/R script or by pressing Ctrl+Shift+N. Then save the new working file by pressing File/Save as…, entering the preferred file name and choosing the preferred directory on your computer.

Set your working directory from which your data frames will be imported and to which the resulting files will be exported by using function setwd. This is a very important step, as it will allow you import files from this location without having to indicate the full path to this directory. And RStudio will save all output and backup files into that directory automatically.

Note for PC users: when inputting the path to any directory in RStudio, it is necessary to replace all backslashes \ with forward slashes /, or you will get an error message and code will not run.

setwd("C:/path/to_your/working_directory")

You can check your working directory using getwd.

getwd()

## [1] "C:/path/to_your/working_directory"

Download the data files listed in section Data above into your working directory.

Before proceeding further, it is necessary to load the following code packages that we will use later. Details on each package can be easily found online. Alternatively, use help(package = "package_name"), to review package information in Help tab in the lower left corner of your RStudio window.

library("dplyr")
library("readxl")
library("reshape2")
library("stringr")
library("stringi")
library("tidyr")
library("ggplot2")

If you get an error message indicating that the required code package does not exist, try installing the package by running code install.packages("package_name") first, and only then run library("package_name").

Warning messages are usually not a problem.

1.2 Import the SDG-HS-NTM concordance matrix into RStudio

If you open the SDG-HS-NTM-Concordance[v1.1 Sep 2019]_0.xlsx file, you will see that the concordance matrix is contained in Sheet 2 titled “Concordance Matrix”. Hence, this is the sheet that needs to be imported by using function read_excel. Note that to be able to further manipulate the data frame it is necessary to create an R object, rather than just import the data file. So below we use matching <-, where matching is the name of the new object, and the arrow <- denotes the action of assignment (alternatively, you can use a single = instead of <-).

Also note that apart from the name of the file, which has to include file extension, the command also contains argument sheet=2 indicating sheet 2 of the imported excel file, argument col_names=TRUE indicating that the first row is used as column names, and argument trim_ws=TRUE indicating that all leading and trailing white spaces should be trimmed.

matching <- read_excel("SDG-HS-NTM-Concordance[v1.1 Sep 2019]_0.xlsx", sheet=2, col_names = TRUE, trim_ws = TRUE)

You can preview the elements of the imported data frame by doing the following:

  • by using function names to preview the names of the columns or function head to preview the first few rows (6 rows by default, which you can change by adding an alternative value for the relevant argument).

Note: In the output after calling function head under the column names you can see text within <>. This indicates class of data stored in each column. <dbl> stands for numeric data, and <chr> stands for character strings. Other possible options are for categorical factors, and for logical. Indicating the right class of data may have impact on the results of data processing.

names(matching)
##  [1] "CODE 0/1      (0: A, IND; 1: C,CwK)"
##  [2] "SDG"
##  [3] "Target"
##  [4] "Product description"
##  [5] "List description"
##  [6] "HS"
##  [7] "HS_version"
##  [8] "NTM"
##  [9] "Keywords"
## [10] "neg_Keywords"
## [11] "A/C/CwK/IND"
## [12] "Notes"
head(matching,10)
## # A tibble: 10 x 12
##    `CODE 0/1      ~ SDG   Target `Product descri~ `List descripti~ HS
##               <dbl> <chr> <chr>  <chr>            <chr>            <chr>
##  1                0 SDG_1 NA     Potentially all~ NA               NA
##  2                0 SDG_1 NA     All products an~ NA               NA
##  3                0 SDG_1 NA     All products an~ NA               NA
##  4                0 SDG_2 NA     Agricultural pr~ HS chapters 01-~ HS c~
##  5                0 SDG_2 NA     Technologies, m~ Relevant 6- dig~ Rele~
##  6                0 SDG_2 NA     Fertilizers and~ Relevant 6- dig~ Rele~
##  7                0 SDG_2 Targe~ All intermediat~ See relevant se~ See ~
##  8                0 SDG_2 Targe~ Any product pro~ See relevant se~ See ~
##  9                0 SDG_2 Targe~ Endangered spec~ See relevant se~ See ~
## 10                1 SDG_2 Targe~ (BASIC)   Agric~ [HS-WTO_Aggri H~ 0101~
## # ... with 6 more variables: HS_version <chr>, NTM <chr>, Keywords <chr>,
## #   neg_Keywords <chr>, `A/C/CwK/IND` <chr>, Notes <chr>
  • by clicking on the name of your object in the Environment tab in the upper right section of the RStudio window, which will open a separate tab with the following preview of your data frame.