Module 2. Introduction to Text Mining

Introduction

In this Module you will learn

  • What is text mining (TM) and what are its potential applications in trade policy analysis setting
  • What are the basic tools and workflow of TM
  • Demonstration in R: Free Trade Agreements

Estimated time requirement

  • Video lecture: 1 hour 11 minutes
  • Studying the code: 3-6 hours1
  • Problem set: 1-2 hours1
  • Doing the quiz: 1-2 hours1

1 This estimation depends on your level of familiarity with R

Learning materials

Download Module 2 materials from here.

  • Lecture 2 folder contains the presentation, dataset and R script featured in the video lecture. Dataset in this module features a set of XML files downloaded from UNCTAD’s Texts of Trade Agreements project. This same dataset should be used for the Problem set in this module.

Review this information on topicmodels package. This will help you understand the features of topic modelling using Latent Dirichlet Allocation method described in this module, as well as some other features.

Video lecture

Watch Module 2 lecture video, and study the replication script featured in it as you listen.

Lecture 2

Problem set

Problem: Lecture 2

  1. Amend the demonstration script to use XML files numbers 3 and 7. Both are trade agreements signed by Chile, with Japan and China respectively. We are interested in knowing to what extent a small country signing agreements with much larger ones is subject to common language, or different language.
  2. Using the demonstration script as a guide, conduct an exploratory text analysis of both PTAs. At a minimum, you should look at word counts and frequencies.
  3. Compare frequency of word use between the two agreements using a scatter plot. Which terms stand out as being more frequent in one agreement than the other?
  4. Is there anything you can conclude from this analysis in a policy sense?

Note: The link to the correct code script will be provided to you upon successful completion of Quiz 2.

Quiz

Now you can attempt QUIZ 2.

You can take the quiz multiple times. You will need to obtain a score of at least 75% in order to get a pass on Quiz 2.