Module 1. Introduction to Machine Learning

Introduction

In this Module you will learn

  • What is machine learning (ML)
  • What are the fundamental differences between ML and econometric methods when applied to trade policy analysis
  • How to use Lasso and some other approaches in ML for trade policy analysis
  • Workflow, tips and traps of ML methods
  • Demonstration in R: The Logistics Performance Index

Estimated time requirement

  • Video lecture: 1 hour 5 minutes
  • Studying the code: 3-6 hour1
  • Problem set: 1-2 hours1
  • Doing the quiz: 1-2 hours1

1 This estimation depends on your level of familiarity with R

Learning materials

Download Module 1 materials from here.

  • Lecture 1 folder contains the presentation, dataset and R script featured in the video lecture, as well as the dataset to be used for the problem set.

Review this information on glmnet package. This will help you understand the different argument settings required for estimating different model families. Also run ?predict.glmnet and review the type argument description to understand the appropriate settings of predict function used in the demonstration.

Video lecture

Watch Module 1 lecture video, and study the replication script featured in it as you listen.

Lecture 1

Problem set

Problem: Lecture 1

  1. The demonstration script uses WDI data to predict the LPI. Let’s imagine instead that we want to use WDI data to classify countries that are likely to see future rapid trade growth, and those that are not. We define “rapid” trade growth as a percentage increase in total exports that is in the top quintile of observed rates over a five year period.
  2. The file “WDI data for export booms.csv” is arranged in the same way as the demonstration data. The variable of interest is export_boom, which is one for countries in the top quintile, zero for others. All other variables have been lagged by one period.
  3. Import the data, clean it, and use interpolation to fill in missing values. You can do all of this by adapting the demonstration code.
  4. Separate the data into training and testing subsamples.
  5. Use the Lasso, Ridge, and 50-50 Elastic Net to estimate a model using levels and interactions of the available WDI variables to predict export booms. Use the training data only.
  6. For the three models, compute prediction accuracy for the testing sample. Compare results. Which model do you prefer, and why?
  7. How could this model be used in policy settings?

Note: The link to the correct code script will be provided to you upon successful completion of Quiz 1.

Quiz

Now you can attempt QUIZ 1.

You can take the quiz multiple times. You will need to obtain a score of at least 75% in order to get a pass on Quiz 1.