Introduction

Welcome to Trade Statistics in Policy Making: A Guide on Using R for Calculating Trade Indices and Indicators.

This Guide is designed to introduce participants to using R to calculate trade indices and indicators that are most commonly used when implementing international trade and trade policy analysis. It focuses on providing brief descriptions of the discussed indices and indicators, including their definitions and formulas, typical data sources and provides examples of R code that can be used to both implement calculations, and visualize the results.

This essentially is a replication script that is based on the popular publication developed by Trade, Investment and Innovation Division of UNESCAP titled A Handbook of Commonly Used Trade Indices and Indicators and authored by Mia Mikic and John Gilbert, and thus follows the basic structure of this publication. It recreates most of the trade indices and indicators from the handbook using R programming software.

Lecture 1

Motivation

A trade indicator can be defined as an index or a ratio used to describe and assess the state of trade flows and trade patterns of a particular economy or economies and can be used to monitor these flows and patterns over time or across economies/regions. Trade indicators can therefore be used to support evidence-based policy making, as they can shed light on national trends, both historically (if a lagging indicator) and what may happen in the future (if a leading indicator).

Analyzing trade indicators is a great first step in determining trade policy choices. Trade indicators can answer the question “what?”, though more advanced strategies like modeling are often needed to answer the follow-up question of “why?”. Nonetheless, trade indicators are a great place to start. For instance, here is a list of questions that some of the trade indicators covered in this Guide (and the handbook) may help you answer:

  • How much is a country dependent on regional trade?
  • Which are the most dynamic products (sectors) in the world (or a regional) market?
  • How much of the increase in intra-regional trade could be attributed to a few countries (one country) in a region?
  • How intense is trade with (regional) trading partners?
  • Are (regional) trading partners’ exports becoming more similar (more competitive) or more complementary?
  • Is there a geographic “re-orientation” of exports after some external shock (such as financial crisis) or enforcement of trade agreement?

Structure of the Guide

This replication script consists of sections organized around five distinct categories of indicators:

  • Trade and economy (trade dependence, import penetration, export propensity, marginal propensity to import)
  • Trade performance (growth rate of trade (exports/imports), normalized trade balance, export/import coverage)
  • Direction of trade (trade intensity, intra-regional trade shares, trade entropy, etc.)
  • Sectoral structure of trade (major export category, index of export diversification, revealed comparative advantage, intra-industry trade, trade overlap, complementarity, export similarity, competitiveness, etc.)
  • Protection (average tariff, weighted average tariff and tariff dispersion)1

Although the structure and the content of this Guide closely follow the structure and the content of the handbook, this replication script utilizes more recent data as opposed to what is featured in the handbook. Therefore, the presented data and the resulting indicator values, graphs and their discussion will differ from those in the handbook.

Each of the sections contains description of the indicators, code script and necessary datasets to be used in the calculations. Some of the code could be written more concisely and efficiently then presented here. However, we deliberately chose to include detailed code script, so as to highlight steps involved in data wrangling and index calculation process.

Prerequisites and software

This Guide relies on using R and RStudio software. R is a free software environment for statistical computing and graphics, which runs on a variety of platforms, while RStudio is an open source software that provides a very convenient console for inputting and running the code, previewing the generated outputs and saving them in different formats. Thus, prior familiarity with R is strongly desirable.

To build some familiarity with R in the context of trade analysis, you can refer to ESCAP Online Training on Using R for Trade Analysis, which covers the very basics of coding with R and then goes on to cover more complex topics of trade analysis.

You can also Google R packages and functions as you go through the replication script, to get better understanding of what they do. Alternatively, you can run help(package = "package_name") in RStudio for a given package or ?function_name() for a given function. Help information will then be displayed in the bottom right corner of your RStudio window in Help tab. The code script presented here is suitable for running both on PC and Mac OS.

Before embarking on the practical chapters of this Guide please review Introduction and Notation, Data Cleaning and Software sections of A Handbook of Commonly Used Trade Indices and Indicators. Also make sure to refer to the handbook for the introductory sections of each practical chapter.

Data access and other considerations

This Guide uses the datasets that can be accessed from the following databases:

This Guide provides direct links to the pre-downloaded datasets from these databases, which can be used when going through the code script. Only in Chapter 1 this Guide makes use on the WDI package to access some of the data by making API calls directly to WDI database.

The WDI package allows users to use R to search and download data from over 40 datasets hosted by the World Bank, including the World Development Indicators (‘WDI’), International Debt Statistics, Doing Business, Human Capital Index, and Sub-national Poverty indicators. Here you can find description of WDIpackage and the accessible datasets.

Some basic data cleaning was done for some of the datasets provided with this Guide. Also it was noted that while 2-digit commodity codes seem to be consistent throughout the different time periods examined in this Guide, commodity names are not. This is mostly due to the fact that trade flows over different years were reported in different versions of the HS (H0 through to H5), and ideally all commodity codes need to be converted to one specific HS version for consistency, which becomes especially relevant if one works with 6-digit level HS codes. However, for the purposes of this Guide we rarely go below 2-digit HS codes. So to make sure that all commodity groups have consistent names throughout we just assigned one commodity name per each 2-digit commodity code disregarding the HS versions reported in the dataset. The dataset with commodity names that we use in this Guide is available here. We only use it in those sections of the Guide where we examine dynamics of a given indicator over several years or compare the indicator values across different economies.

Disclaimers

  • Given that R packages continuously evolve and new updates are issued, some of the functions may change and their arguments may be revised at a later date. Still the basic principles of using R to handle data and calculate trade indicators will be valid. The code script featured in this Guide is up-to-date and fully functional as of May 2023. Revised editions of the Guide may be issued as necessary.

  • Additionally, the datasets that feed into this Guide may be updated by their publishers at a later date. Thus, at some point when you access the same data from the databases directly you may end up having results that are somewhat different from what is showcased in the Guide. Please also note, that the Guide calculates indices using more recent trade data, and therefore indicator values presented in the Guide are different from those featured in the handbook.

  • A very important part of working with any data is cleaning it. In this Guide we provide some basic cleaning of the datasets and describe those cases where we took a few shortcuts and made assumptions for the purpose of keeping this Guide manageable and not too long. Once you will move on to conducting your own analysis, especially with data taken from other sources, you will have to do your own data checking and cleaning before calculating the indices.

  • The purpose of this Guide is to teach how to code with R to conduct calculations of various indicators. Thus, some of the code script goes into very detailed steps to show clearly what is happening with the data. Once you get familiar with code you can always optimize the code to make it more concise and efficient.

Acknowledgements

This Guide is based on A Handbook of Commonly Used Trade Indices and Indicators authored by Mia Mikic and John Gilbert.

The Guide materials were developed and compiled by Maria Semenova, under the overall guidance of Alexey Kravchenko, and with the notable contributions from Alexis Athens, Malte Mowlavi, Andrea Dalla Rosa and Ziqian Yin.


  1. Effective rate of protection is not covered by the Guide at this point.↩︎