This online course is designed to introduce participants to the basic techniques involved in machine learning and text mining, with applications focusing on international trade policy.

The course will emphasize developing an intuitive understanding of the tools involved, so that participants can identify future applications in their own work. It is a hands-on course, with participants expected to spend considerable time analyzing real-world data and replicating the modelling extensively described in the video lectures.

This training was originally conducted by Ben Shepherd live in spring of 2021, hence the video lectures contain references to live discussions (please disregard them). The materials of that live course are now adapted to suit offline self-paced mode of learning.

Acknowledgements and additional notes

All instructional materials of the course were developed by Ben Shepherd for the live training conducted in 2021. Ben Shepherd, the Principal of Developing Trade Consultants, is a trade economist and international development consultant. He has worked on a wide range of trade and development issues with organizations such as the World Bank, the OECD, the Asian Development Bank, the Inter-American Development Bank, the United Nations, and the Asia-Pacific Economic Cooperation. He specializes in providing policy-relevant research, as well as capacity-building seminars for researchers working in trade and development. He has published more than fifteen articles in peer-reviewed journals, and a similar number of book chapters.

The self-paced version of the course was compiled and quizzes were developed by Maria Semenova, Consultant of TIID, ESCAP. Code scripts were reviewed and updated appropriately by Maria Semenova and Malte Mowlavi. Specifically, since 2021 some packages used in Module 3 deprecated certain functions, and thus replication script had to be edited to reflect these changes. Additionally, some changes were made to ensure that the code script runs smoothly both on Windows and Mac OS. The code featured in this self-paced version of the course is up-to-date and fully functional as of January 2023.

Structure of the course

This course consists of three modules.

Module 1. Introduction to Machine Learning

  • What is machine learning
  • Machine learning and econometrics
  • The Lasso and other approaches
  • Workflow, tips and traps
  • Demonstration in R: The Logistics Performance Index

Module 2. Introduction to Text Mining

  • What is text mining
  • Basic tools and workflow of text mining
  • Demonstration in R: Free Trade Agreements

Module 3. Introduction to Text Based Prediction and Classification

  • A Text-Based Classification Problem
  • Artificial Neural Networks for Prediction and Classification
  • Text as an Input
  • Demonstration in R 1: Revisiting the LPI
  • Demonstration in R 2: Classifying NTMs

Each of the Modules contains video lectures, a problems set, and a quiz to test your knowledge.

For each module you are provided with the files containing R code scripts and data sets featured in each of the video lectures, so that you can follow by running and studying the code while listening to the lectures.

You are also provided with the datasets to be used when doing each of the problem sets and quizzes.

Certificate of completion

Upon successful completion of the training you will receive Certificate of Completion issued by Trade, Investment and Innovation Division of ESCAP.

To receive the certificate for completing the course, participant has to pass each quiz with the score not below 75%. When attempting the quizzes please make sure to enter your name accurately and exactly the way you need it to be featured on your certificate. Also make sure to enter the correct email address to which the certificate will have to be sent and make sure to check you Spam inbox.

Once you successfully complete the course, the certificate will be sent to you.

Software and prerequisites

This course is conducted in R with the use of RStudio software. R is a free software environment for statistical computing and graphics, which runs on a variety of platforms, while RStudio, that we use for the purpose of this training, is an open source software that provides a very convenient console for inputting and running the code, previewing the generated outputs and saving them in different formats.

Prior familiarity with R is strongly desirable for the successful completion of the course. To get familiar with the basics of using R for trade policy analysis it is recommended to complete ESCAP Online Training on Using R for Trade Analysis.

You can also Google R packages and functions as you go through the course, to get better understanding of what they do. Alternatively, you can run help(package = "package_name") in RStudio for a given package or ?function_name for a given function. Help information will then be displayed in the bottom right corner of your RStudio window in Help tab.

Also all participants must have some background in econometrics and statistics, typically to graduate level. This will facilitate grasping the concepts discussed in this course and make entering the field of Machine Learning easier.