Chapter 2 Partial equilibrium trade policy analysis with structural gravity

2.1 Traditional Gravity Estimates

2.1.1 Preparing the data

If you haven’t used R before, or to be more precise, you have only fitted a few regressions without much practice on transforming and cleaning data before, check chapters 5 and 18 from Wickham and Grolemund (2016).

Please see the note from page 42 in Yotov et al. (2016). It’s a really important note, which tells us that we need to:

  1. Filter observations for a range of years (1986, 1990, 1994, 1998, 2002 and 2006)
  2. Transform some variables to logarithm scale (trade and dist) and create new variables from those in the original dataset
  3. Remove cases where both the exporter and the importer are the same
  4. Drop observations where the trade flow is zero

Provided that the datasets use 3 GB on disk, we opted to use a local SQL database (DuckDB) instead of spreadsheets or even native R files. If you are not familiar with SQL, do not worry because we provide the function yotov_data(), so that yotov_data("ch1_application1") goes to the database and returns the data for this exercise.

Step 1 is straightforward:

library(yotover)
## ── yotover 0.3.5 ───────────────────────────────────────────────────────────────
## ٭ Please, read the documentation.
## ٭ Use the command citation('yotover') to cite this package in publications.
## ٭ Visit https://buymeacoffee.com/pacha if you'd like to donate to help improving this software.
## ! This package downloads a 30 MB compressed file and creates a 6 GB database.
## ! If you don't want to create a database in your home directory,
## ! run usethis::edit_r_environ() and create the environment variable YOTOV_DB_DIR with your desired location.
## ── Attaching packages ──────────────────────────────────────────────────────────
## ✓ dplyr    1.0.5      ✓ broom    0.7.6 
## ✓ tidyr    1.1.3      ✓ msm      1.6.8 
## ✓ sandwich 3.0.0      ✓ ggplot2  3.3.3 
## ✓ lmtest   0.9.38
## ── Database status ─────────────────────────────────────────────────────────────
## ✓ Local Yotov database is OK.
ch1_application1_2 <-  yotov_data("ch1_application1") %>% 
  filter(year %in% seq(1986, 2006, 4))

For step 2, this can be divided in parts, starting with the log transformation of trade and dist:

ch1_application1_2 <- ch1_application1_2 %>% 
  mutate(
    log_trade = log(trade),
    log_dist = log(dist)
  )

Continuing step 2, we can now create the variables \(Y_{i,t}\) and \(E_{i,t}\) that appear on the OLS model equation:

ch1_application1_2 <- ch1_application1_2 %>% 
  # Create Yit
  group_by(exporter, year) %>% 
  mutate(
    y = sum(trade),
    log_y = log(y)
  ) %>% 
  
  # Create Eit
  group_by(importer, year) %>% 
  mutate(
    e = sum(trade),
    log_e = log(e)
  )

The OLS model with remoteness index needs both exporter and importer index, which can be created by grouping variables:

ch1_application1_2 <- ch1_application1_2 %>%
  # Replicate total_e
  group_by(exporter, year) %>%
  mutate(total_e = sum(e)) %>%
  group_by(year) %>%
  mutate(total_e = max(total_e)) %>%

  # Replicate rem_exp
  group_by(exporter, year) %>%
  mutate(
    remoteness_exp = sum(dist *  total_e / e),
    log_remoteness_exp = log(remoteness_exp)
  ) %>%

  # Replicate total_y
  group_by(importer, year) %>%
  mutate(total_y = sum(y)) %>%
  group_by(year) %>%
  mutate(total_y = max(total_y)) %>%

  # Replicate rem_imp
  group_by(importer, year) %>%
  mutate(
    remoteness_imp = sum(dist / (y / total_y)),
    log_remoteness_imp = log(remoteness_imp)
  )

To create the variables for the OLS with Fixed Effects Model, we followed box #1 in page 44 from Yotov et al. (2016). We combine both exporter and importer variables with the year in order to create the fixed effects variables:

ch1_application1_2 <- ch1_application1_2 %>%
  # This merges the columns exporter/importer with year
  mutate(
    exp_year = paste0(exporter, year),
    imp_year = paste0(importer, year)
  )

This concludes step 2.

Now we need to perform step 3:

ch1_application1_2 <- ch1_application1_2 %>% 
  filter(exporter != importer)

Step 4 is used in some cases and we will be explicit about it.

2.1.2 OLS estimation ignoring multilateral resistance terms

The general equation for this model is: \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \:\beta_6 \log E_{j,t} + \varepsilon_{ij,t} \end{align} \]

See page 41 in Yotov et al. (2016) for a full detail of each variable.

The model for this case is straightforward, and in this case we need to apply step 4 from thje previous section to drop cases where the trade is zero:

fit_ols <- lm(
  log_trade ~ log_dist + cntg + lang + clny + log_y + log_e,
  data = ch1_application1_2 %>% 
    filter(trade > 0)
)

summary(fit_ols)
## 
## Call:
## lm(formula = log_trade ~ log_dist + cntg + lang + clny + log_y + 
##     log_e, data = ch1_application1_2 %>% filter(trade > 0))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.5421  -0.8281   0.1578   1.0476   7.6585 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -11.283080   0.151732  -74.36  < 2e-16 ***
## log_dist     -1.001607   0.014159  -70.74  < 2e-16 ***
## cntg          0.573805   0.074427    7.71 1.31e-14 ***
## lang          0.801548   0.033748   23.75  < 2e-16 ***
## clny          0.734853   0.070387   10.44  < 2e-16 ***
## log_y         1.190236   0.005402  220.32  < 2e-16 ***
## log_e         0.907588   0.005577  162.73  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.743 on 25682 degrees of freedom
## Multiple R-squared:  0.7585, Adjusted R-squared:  0.7585 
## F-statistic: 1.345e+04 on 6 and 25682 DF,  p-value: < 2.2e-16

Now the model is almost ready! Now we only need to stick to the methodology from Yotov et al. (2016) and cluster the standard errors by country pair (see the note in page 42, it is extremely important). This is not straightforward and requires additional work.

The yotover package provides a nice function to do this and more. Please read the documentation of the package and look the yotov_model_summary() function, it summarises the model in the exact way as reported in the book by providing:

  • Clustered standard errors
  • Number of observations
  • \(R^2\) (if applicable)
  • Presence (or absence) of exporter and exporter time fixed effects
  • RESET test p-value

This is returned as a list to keep it simple.

Finally, here is the model as reported in the book:

yotov_model_summary(
  formula = "log_trade ~ log_dist + cntg + lang + clny + log_y + log_e",
  data = filter(ch1_application1_2, trade > 0),
  method = "lm"
)
## $tidy_coefficients
## # A tibble: 7 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  -11.3     0.296      -38.1  1.17e-309
## 2 log_dist      -1.00    0.0273     -36.6  1.85e-286
## 3 cntg           0.574   0.185        3.11 1.90e-  3
## 4 lang           0.802   0.0821       9.76 1.78e- 22
## 5 clny           0.735   0.144        5.10 3.49e-  7
## 6 log_y          1.19    0.00946    126.   0        
## 7 log_e          0.908   0.00991     91.6  0        
## 
## $nobs
## [1] 25689
## 
## $rsquared
## [1] 0.7585251
## 
## $etfe
## [1] FALSE
## 
## $itfe
## [1] FALSE
## 
## $reset_pval
## [1] 4.346285e-15

Please notice that the summary hides the exporter/importer fixed effects.

2.1.3 OLS estimation controlling for multilateral resistance terms with remote indexes

The remoteness model adds variables to the OLS model. The general equation for this model is: \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \beta_6 \log E_{j,t} + \beta_7 \log(REM\_EXP_i,t) + \beta_8 \log(REM\_IMP_i,t) + \varepsilon_{ij,t} \end{align} \]

Where \[ \log(REM\_EXP_{i,t}) = \log \left( \sum_j \frac{DIST_{i,j}}{E_{j,t} / Y_t} \right)\\ \log(REM\_IMP_{i,t}) = \log \left( \sum_i \frac{DIST_{i,j}}{E_{i,t} / Y_t} \right) \]

See page 43 in Yotov et al. (2016) for a full detail of each variable.

Our approach follows box #1 in page 43 from Yotov et al. (2016). Fitting the regression is straightforward, it’s just about adding more regressors to what we did in the last section, and we can create a list with a summary for the model:

yotov_model_summary(
  formula = "log_trade ~ log_dist + cntg + lang + clny + log_y + log_e +
    log_remoteness_exp + log_remoteness_imp",
  data = filter(ch1_application1_2, trade > 0),
  method = "lm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
##   term               estimate std.error statistic   p.value
##   <chr>                 <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)         -35.2     1.99       -17.7  6.28e- 70
## 2 log_dist             -1.18    0.0313     -37.9  9.17e-306
## 3 cntg                  0.247   0.177        1.39 1.63e-  1
## 4 lang                  0.739   0.0784       9.43 4.48e- 21
## 5 clny                  0.842   0.150        5.61 2.08e-  8
## 6 log_y                 1.16    0.00948    123.   0        
## 7 log_e                 0.903   0.00991     91.1  0        
## 8 log_remoteness_exp    0.972   0.0682      14.3  6.66e- 46
## 9 log_remoteness_imp    0.274   0.0598       4.58 4.71e-  6
## 
## $nobs
## [1] 25689
## 
## $rsquared
## [1] 0.765028
## 
## $etfe
## [1] FALSE
## 
## $itfe
## [1] FALSE
## 
## $reset_pval
## [1] 7.672904e-14

2.1.4 OLS estimation controlling for multilateral resistance terms with fixed effects

The general equation for this model is: \[ \begin{align} \log X_{ij,t} =& \:\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\\ \text{ }& \:\beta_4 CLNY_{i,j} + \varepsilon_{ij,t} \end{align} \]

Where the added terms, with respect to the OLS model, are \(\pi_{i,t}\) and \(\chi_{i,t}\) that account for exporter-time and importer-time fixed effects respectively. See page 44 in Yotov et al. (2016) for a full detail of each variable.

Now we can easily generate a list as we did with the previous models:

yotov_model_summary(
  formula = "log_trade ~ log_dist + cntg + lang + clny + exp_year + imp_year",
  data = filter(ch1_application1_2, trade > 0),
  method = "lm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)   13.1      0.508      25.7  4.44e-144
## 2 log_dist      -1.22     0.0382    -31.8  3.77e-218
## 3 cntg           0.223    0.203       1.10 2.71e-  1
## 4 lang           0.661    0.0821      8.05 8.41e- 16
## 5 clny           0.670    0.149       4.49 7.24e-  6
## 
## $nobs
## [1] 25689
## 
## $rsquared
## [1] 0.8432398
## 
## $etfe
## [1] TRUE
## 
## $itfe
## [1] TRUE
## 
## $reset_pval
## [1] 2.473022e-231

2.1.5 PPML estimation controlling for multilateral resistance terms with fixed effects

The general equation for this model is:

\[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} +\right.\\ \text{ }& \:\left.\beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j}\right] \times \varepsilon_{ij,t} \end{align} \]

The reason to compute this model even in spite of speed is that PPML is the only estimator that is perfectly consistent with the theoretical gravity model. By estimating with PPML the fixed effects correspond exactly to the corresponding theoretical terms.

The data for this model is exactly the same as for the fixed effects model.

One option in R is to use the glm() function and a quasi-poisson family to avoid overdispersion problems:

fit_ppml <- glm(trade ~ log_dist + cntg + lang + clny + exp_year + imp_year,
  family = quasipoisson(link = "log"),
  data = ch1_application1_2,
  y = FALSE,
  model = FALSE
)

In the previous model, a glm model object carries a copy of its training data by default. We used the settings y = FALSE and model = FALSE to turn this off, which decreases the size of the model without affecting the model’s predictions (Zumel 2014).

If you decide to run this model and print the summary yourself, you’ll notice that it doesn’t report \(R^2\) and that it shows a large list of fixed effects. The \(R^2\) needs to be computed afterwards as a function of the correlation between the observed and predicted values. Please see Silva and Tenreyro (2006) for the details as well as for the RESET test for PPML (GLM) models.

Software such as Stata, without dedicated functions, reports an incorrect \(R^2\) for PPML model, it actually reports a pseudo-\(R^2\). To construct a proper \(R^2\), yotov_model_summary() takes the correlation between actual and predicted trade flows.

We can obtain a detailed list as in the previous examples:

yotov_model_summary(
  formula = "trade ~ log_dist + cntg + lang + clny + exp_year + imp_year",
  data = ch1_application1_2,
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)   10.4      0.452      23.0  2.29e-117
## 2 log_dist      -0.841    0.0317    -26.6  1.80e-155
## 3 cntg           0.437    0.0832      5.26 1.44e-  7
## 4 lang           0.247    0.0765      3.23 1.22e-  3
## 5 clny          -0.222    0.116      -1.91 5.56e-  2
## 
## $nobs
## [1] 28152
## 
## $rsquared
## [1] 0.5859927
## 
## $etfe
## [1] TRUE
## 
## $itfe
## [1] TRUE
## 
## $reset_pval
## [1] 0.6416118

Please notice that the previous summary intentionally doesn’t show time exporter/importer fixed effects.

2.2 The “distance puzzle” resolved

2.2.1 Preparing the data

Please see the note from page 47 in Yotov et al. (2016). We need to proceed with similar steps as in the previous section.

Unlike the previous section, we will create different tables for OLS, PPML and its variations, because the solution for the “distance puzzle” implies different transformations and filters for each case.

The distance puzzle proposes this gravity specification:

\[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j}\right]\times\\ \text{ }& \:\exp\left[\beta_4 CLNY_{i,j} + \beta_5 \log(DIST\_INTRA_{i,i})\right] \times \varepsilon_{ij,t} \end{align} \] The difference with respect to the last section is that now we need to separate the log_dist variable into multiple columns that account for discrete time effects. This is expressed into the \(\beta_T\) terms of the equation. Perhaps the easiest option to do this is to transform year into a text column and then use the spread() function.

For the OLS model we need to remove cases where the exporter is the same as the importer and cases where trade is zero. For the PPML models we need to mark rows where the exporter and the importer are the same, and we need to create the smctry column, which is also required to transform the log_dist_* variables as shown in box #1 in page 48 from Yotov et al. (2016):

In order to avoid creating two datasets that are very similar, we shall create one dataset to cover both OLS and PPML:

ch1_application2_2 <- yotov_data("ch1_application2") %>% 
  # this filter covers both OLS and PPML
  filter(year %in% seq(1986, 2006, 4)) %>% 
  mutate(
    # variables for both OLS and PPML
    exp_year = paste0(exporter, year),
    imp_year = paste0(importer, year),
    year = paste0("log_dist_", year),
    log_trade = log(trade),
    log_dist = log(dist),
    smctry = ifelse(importer != exporter, 0, 1),
    
    # PPML specific variables
    log_dist_intra = log_dist * smctry,
    intra_pair = ifelse(exporter == importer, exporter, "inter")
  ) %>% 
  spread(year, log_dist, fill = 0) %>% 
  mutate(across(log_dist_1986:log_dist_2006, ~ .x * (1 - smctry)))

Here the across() function is a shortcut to avoid writing something like:

ch1_application2_2 %>% 
  mutate(
    log_dist_1986 =  log_dist_1986 * (1 - smctry),
    log_dist_1990 =  log_dist_1990 * (1 - smctry),
    ... REPEAT log_dist_T many_times ....
    log_dist_2006 =  log_dist_2006 * (1 - smctry)
  )

Also notice that the OLS model shall require filtering when we specify the model, because we skipped filtering the cases where trade is equal to zero and both the importer and the exporter are the same.

2.2.2 OLS solution for the “distance puzzle”

The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.

With the data from above, the model specification is straightforward:

yotov_model_summary2(
  formula = "log_trade ~ 0 + log_dist_1986 + log_dist_1990 + log_dist_1994 +
    log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg +
    lang + clny + exp_year + imp_year",
  data = filter(ch1_application2_2, importer != exporter, trade > 0),
  method = "lm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
##   term          estimate std.error statistic   p.value
##   <chr>            <dbl>     <dbl>     <dbl>     <dbl>
## 1 log_dist_1986   -1.17     0.0436    -26.8  9.33e-156
## 2 log_dist_1990   -1.16     0.0423    -27.3  1.10e-161
## 3 log_dist_1994   -1.21     0.0457    -26.5  1.07e-152
## 4 log_dist_1998   -1.25     0.0428    -29.2  4.14e-184
## 5 log_dist_2002   -1.24     0.0441    -28.1  1.34e-171
## 6 log_dist_2006   -1.26     0.0437    -28.9  4.02e-180
## 7 cntg             0.223    0.203       1.10 2.71e-  1
## 8 lang             0.661    0.0821      8.06 8.20e- 16
## 9 clny             0.670    0.149       4.49 7.25e-  6
## 
## $nobs
## [1] 25689
## 
## $pct_chg_log_dist
## [1] 7.950156
## 
## $pcld_std_err
## [1] 3.75886
## 
## $pcld_std_err_pval
## [1] 0.03442616
## 
## $intr
## [1] FALSE
## 
## $csfe
## [1] FALSE

Notice that, unlike the previous section, we used the notation y ~ 0 + .... The zero means not to include a constant.

2.2.3 PPML solution for the “distance puzzle”

This model is very similar to the one specified in the PPML section from the last section. We can fit the model in a direct way:

yotov_model_summary2(
  formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 + 
    log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 + 
    cntg + lang + clny + exp_year + imp_year",
  data = filter(ch1_application2_2, importer != exporter),
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
##   term          estimate std.error statistic   p.value
##   <chr>            <dbl>     <dbl>     <dbl>     <dbl>
## 1 log_dist_1986   -0.859    0.0370    -23.2  5.09e-119
## 2 log_dist_1990   -0.834    0.0377    -22.1  1.42e-108
## 3 log_dist_1994   -0.835    0.0351    -23.8  4.09e-125
## 4 log_dist_1998   -0.847    0.0354    -23.9  9.44e-127
## 5 log_dist_2002   -0.848    0.0316    -26.8  2.66e-158
## 6 log_dist_2006   -0.836    0.0312    -26.7  1.53e-157
## 7 cntg             0.437    0.0832      5.26 1.46e-  7
## 8 lang             0.248    0.0765      3.23 1.22e-  3
## 9 clny            -0.222    0.116      -1.91 5.60e-  2
## 
## $nobs
## [1] 28152
## 
## $pct_chg_log_dist
## [1] -2.750345
## 
## $pcld_std_err
## [1] 3.003899
## 
## $pcld_std_err_pval
## [1] 0.3598811
## 
## $intr
## [1] FALSE
## 
## $csfe
## [1] FALSE

2.2.4 Internal distance solution for the “distance puzzle”

This model just requires us to add the log_dist_intra variable to the PPML model and not to filter the rows where the exporter and the importer are the same:

yotov_model_summary2(
  formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 + 
    log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 +
    cntg + lang + clny + exp_year + imp_year + log_dist_intra",
  data = ch1_application2_2,
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 10 x 5
##    term           estimate std.error statistic  p.value
##    <chr>             <dbl>     <dbl>     <dbl>    <dbl>
##  1 log_dist_1986   -0.980     0.0720   -13.6   3.78e-42
##  2 log_dist_1990   -0.940     0.0731   -12.9   7.98e-38
##  3 log_dist_1994   -0.915     0.0720   -12.7   5.71e-37
##  4 log_dist_1998   -0.887     0.0710   -12.5   9.36e-36
##  5 log_dist_2002   -0.884     0.0706   -12.5   6.18e-36
##  6 log_dist_2006   -0.872     0.0713   -12.2   2.07e-34
##  7 cntg             0.371     0.140      2.66  7.81e- 3
##  8 lang             0.337     0.168      2.01  4.49e- 2
##  9 clny             0.0192    0.156      0.123 9.02e- 1
## 10 log_dist_intra  -0.488     0.101     -4.85  1.23e- 6
## 
## $nobs
## [1] 28566
## 
## $pct_chg_log_dist
## [1] -10.96483
## 
## $pcld_std_err
## [1] 1.058038
## 
## $pcld_std_err_pval
## [1] 1.819694e-25
## 
## $intr
## [1] TRUE
## 
## $csfe
## [1] FALSE

2.2.5 Internal distance and home bias solution for the “distance puzzle”

This model just requires us to add the smctry variable to the internal distance model and repeat the rest of the steps from the last section:

yotov_model_summary2(
  formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 + 
    log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 +
    cntg + lang + clny + exp_year + imp_year + log_dist_intra + smctry",
  data = ch1_application2_2,
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 11 x 5
##    term           estimate std.error statistic  p.value
##    <chr>             <dbl>     <dbl>     <dbl>    <dbl>
##  1 log_dist_1986   -0.857     0.0627   -13.7   1.37e-42
##  2 log_dist_1990   -0.819     0.0633   -12.9   2.71e-38
##  3 log_dist_1994   -0.796     0.0635   -12.5   4.84e-36
##  4 log_dist_1998   -0.770     0.0630   -12.2   2.43e-34
##  5 log_dist_2002   -0.767     0.0626   -12.2   1.87e-34
##  6 log_dist_2006   -0.754     0.0622   -12.1   7.42e-34
##  7 cntg             0.574     0.155      3.70  2.16e- 4
##  8 lang             0.352     0.137      2.57  1.01e- 2
##  9 clny             0.0269    0.125      0.215 8.29e- 1
## 10 log_dist_intra  -0.602     0.109     -5.52  3.42e- 8
## 11 smctry           1.69      0.574      2.94  3.24e- 3
## 
## $nobs
## [1] 28566
## 
## $pct_chg_log_dist
## [1] -11.96934
## 
## $pcld_std_err
## [1] 1.173043
## 
## $pcld_std_err_pval
## [1] 9.546682e-25
## 
## $intr
## [1] TRUE
## 
## $csfe
## [1] FALSE

2.2.6 Fixed effects solution for the “distance puzzle”

This model just requires us to remove the variables log_dist_intra and smctry from the last model and include the intra_pair variable to account for the intra-national fixed effects:

yotov_model_summary2(
  formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 + 
    log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 +
    cntg + lang + clny + exp_year + imp_year + intra_pair",
  data = ch1_application2_2,
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
##   term          estimate std.error statistic   p.value
##   <chr>            <dbl>     <dbl>     <dbl>     <dbl>
## 1 log_dist_1986   -0.910    0.0323    -28.2  8.27e-175
## 2 log_dist_1990   -0.879    0.0321    -27.4  3.65e-165
## 3 log_dist_1994   -0.860    0.0318    -27.0  1.35e-160
## 4 log_dist_1998   -0.833    0.0317    -26.3  1.51e-152
## 5 log_dist_2002   -0.829    0.0320    -25.9  6.16e-148
## 6 log_dist_2006   -0.811    0.0320    -25.3  1.91e-141
## 7 cntg             0.442    0.0816      5.41 6.13e-  8
## 8 lang             0.241    0.0760      3.16 1.55e-  3
## 9 clny            -0.220    0.117      -1.89 5.86e-  2
## 
## $nobs
## [1] 28566
## 
## $pct_chg_log_dist
## [1] -10.93109
## 
## $pcld_std_err
## [1] 0.768795
## 
## $pcld_std_err_pval
## [1] 3.51897e-46
## 
## $intr
## [1] TRUE
## 
## $csfe
## [1] TRUE

2.3 Regional trade agreements effects

2.3.1 Preparing the data

This model specification includes gravity covariates, including both importer and exporter time fixed effects:

\[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\right.\\ \text{ }& \:\left.\beta_4 CLNY_{i,j} + \beta_5 RTA_{ij,t}\right] \times \varepsilon_{ij,t} \end{align} \] We need to create additional variables, in comparison to the previous examples, to include fixed effects that account for the observations where the exporter and the importer are the same. These variables are intl_brdr, pair_id_2 and the columns of the form intl_border_Y where Y corresponds to the year.

The direct way of obtaining the desired variables is quite similar to what we did in the previous sections:

ch1_application3_2 <- yotov_data("ch1_application3") %>% 
  filter(year %in% seq(1986, 2006, 4)) %>% 
  mutate(
    exp_year = paste0(exporter, year),
    imp_year = paste0(importer, year),
    year = paste0("intl_border_", year),
    log_trade = log(trade),
    log_dist = log(dist),
    intl_brdr = ifelse(exporter == importer, pair_id, "inter"),
    intl_brdr_2 = ifelse(exporter == importer, 0, 1),
    pair_id_2 = ifelse(exporter == importer, "0-intra", pair_id)
  ) %>% 
  spread(year, intl_brdr_2, fill = 0)

Notice that we used 0-intra and not just intra. This is because the rest of the observations in pair_id_2 are numbers 1,…,N, and R internals shall consider 0-intra as the reference factor for being the first item when it orders the unique observations alphabetically. This makes the difference between the expected behavior or any behavior in the next chapter.

In addition, we need to create the variable sum_trade to filter the cases where the sum by pair_id is zero:

ch1_application3_2 <- ch1_application3_2 %>% 
  group_by(pair_id) %>% 
  mutate(sum_trade = sum(trade)) %>% 
  ungroup()

2.3.2 OLS standard RTA estimates with international trade only

The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.

With the data from above, the model specification is straightforward:

yotov_model_summary3(
  formula = "log_trade ~ 0 + log_dist + cntg + lang + clny + 
    rta + exp_year + imp_year",
  data = filter(ch1_application3_2, trade > 0, importer != exporter),
  method = "lm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
##   term     estimate std.error statistic   p.value
##   <chr>       <dbl>     <dbl>     <dbl>     <dbl>
## 1 log_dist -1.22       0.0390  -31.2    1.92e-209
## 2 cntg      0.223      0.203     1.10   2.72e-  1
## 3 lang      0.661      0.0821    8.05   8.94e- 16
## 4 clny      0.670      0.149     4.49   7.24e-  6
## 5 rta      -0.00439    0.0540   -0.0813 9.35e-  1
## 
## $nobs
## [1] 25689
## 
## $total_rta_effect
## [1] -0.004389628
## 
## $trta_std_err
## [1] 0.05398407
## 
## $trta_std_err_pval
## [1] 0.9351927
## 
## $intr
## [1] FALSE

2.3.3 PPML standard RTA estimates with international trade only

The model specification is very similar to OLS and we only need to change the function lm():

yotov_model_summary3(
  formula = "trade ~ 0 + log_dist + cntg + lang + clny + 
    rta + exp_year + imp_year",
  data = filter(ch1_application3_2, importer != exporter),
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
##   term     estimate std.error statistic   p.value
##   <chr>       <dbl>     <dbl>     <dbl>     <dbl>
## 1 log_dist   -0.822    0.0310    -26.5  5.87e-155
## 2 cntg        0.416    0.0828      5.02 5.20e-  7
## 3 lang        0.250    0.0767      3.26 1.12e-  3
## 4 clny       -0.205    0.114      -1.80 7.23e-  2
## 5 rta         0.191    0.0658      2.90 3.75e-  3
## 
## $nobs
## [1] 28152
## 
## $total_rta_effect
## [1] 0.1907176
## 
## $trta_std_err
## [1] 0.06579676
## 
## $trta_std_err_pval
## [1] 0.003748494
## 
## $intr
## [1] FALSE

2.3.4 Addressing potential domestic trade diversion

The model specification is quite the same as PPML and we only need to add the variable intl_brdr but using the full dataset instead of removing rows where the importer and the exporter are the same:

yotov_model_summary3(
  formula = "trade ~ 0 + log_dist + cntg + lang + clny + 
    rta + exp_year + imp_year + intl_brdr",
  data = ch1_application3_2,
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
##   term     estimate std.error statistic   p.value
##   <chr>       <dbl>     <dbl>     <dbl>     <dbl>
## 1 log_dist   -0.800    0.0303    -26.4  4.21e-154
## 2 cntg        0.393    0.0789      4.99 6.19e-  7
## 3 lang        0.244    0.0771      3.16 1.57e-  3
## 4 clny       -0.182    0.113      -1.61 1.08e-  1
## 5 rta         0.409    0.0688      5.94 2.93e-  9
## 
## $nobs
## [1] 28566
## 
## $total_rta_effect
## [1] 0.4085219
## 
## $trta_std_err
## [1] 0.06882661
## 
## $trta_std_err_pval
## [1] 2.929108e-09
## 
## $intr
## [1] TRUE

2.3.5 Addressing potential endogeneity of RTAs

The model specification consists in including the rta variable and the fixed effects exp_year, imp_year and pair_id_2 to account for domestic trade:

yotov_model_summary3(
  formula = "trade ~ 0 + rta + exp_year + imp_year + pair_id_2",
  data = filter(ch1_application3_2, sum_trade > 0),
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 1 x 5
##   term  estimate std.error statistic  p.value
##   <chr>    <dbl>     <dbl>     <dbl>    <dbl>
## 1 rta      0.557    0.0143      39.0 1.2e-322
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.5571853
## 
## $trta_std_err
## [1] 0.1022353
## 
## $trta_std_err_pval
## [1] 5.036185e-08
## 
## $intr
## [1] TRUE

2.3.6 Testing for potential “reverse causality” between trade and RTAs

We need to modify the previous model in order to include the variable rta_lead4 and to consider where sum_trade is greater than zero:

yotov_model_summary3(
  formula = "trade ~ 0 + rta + rta_lead4 + exp_year + imp_year + pair_id_2",
  data = filter(ch1_application3_2, sum_trade > 0),
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 2 x 5
##   term      estimate std.error statistic       p.value
##   <chr>        <dbl>     <dbl>     <dbl>         <dbl>
## 1 rta         0.520     0.0859     6.06  0.00000000140
## 2 rta_lead4   0.0774    0.0921     0.841 0.401        
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.5974906
## 
## $trta_std_err
## [1] 0.1380265
## 
## $trta_std_err_pval
## [1] 1.49917e-05
## 
## $intr
## [1] TRUE

2.3.7 Addressing potential non-linear and phasing-in effects of RTAs

Instead of future-lagged rta variable, as in the previous model, we modify the previous model and include the rta_lagN past-lagged variables instead:

yotov_model_summary3(
  formula = "trade ~ 0 + rta + rta_lag4 + rta_lag8 + rta_lag12 +
    exp_year + imp_year + pair_id_2",
  data = filter(ch1_application3_2, sum_trade > 0),
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 4 x 5
##   term      estimate std.error statistic  p.value
##   <chr>        <dbl>     <dbl>     <dbl>    <dbl>
## 1 rta          0.291    0.0892      3.27 1.08e- 3
## 2 rta_lag4     0.414    0.0672      6.15 7.76e-10
## 3 rta_lag8     0.169    0.0431      3.91 9.17e- 5
## 4 rta_lag12    0.119    0.0301      3.96 7.65e- 5
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.9926189
## 
## $trta_std_err
## [1] 0.09425262
## 
## $trta_std_err_pval
## [1] 3.09295e-26
## 
## $intr
## [1] TRUE

2.3.8 Addressing globalization effects

Just as an addition to the previous model, we include the intl_border_T variables in addition to rta_lagN:

yotov_model_summary3(
  formula = "trade ~ 0 + rta + rta_lag4 + rta_lag8 + rta_lag12 +
    intl_border_1986 + intl_border_1990 + intl_border_1994 +
    intl_border_1998 + intl_border_2002 +
    exp_year + imp_year + pair_id_2",
  data = filter(ch1_application3_2, sum_trade > 0),
  method = "glm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
##   term             estimate std.error statistic  p.value
##   <chr>               <dbl>     <dbl>     <dbl>    <dbl>
## 1 rta               0.116      0.0867    1.33   1.82e- 1
## 2 rta_lag4          0.288      0.0617    4.67   3.06e- 6
## 3 rta_lag8          0.0693     0.0482    1.44   1.50e- 1
## 4 rta_lag12         0.00236    0.0291    0.0809 9.36e- 1
## 5 intl_border_1986 -0.706      0.0478  -14.8    2.49e-49
## 6 intl_border_1990 -0.480      0.0429  -11.2    4.94e-29
## 7 intl_border_1994 -0.367      0.0335  -11.0    6.29e-28
## 8 intl_border_1998 -0.158      0.0233   -6.80   1.08e-11
## 9 intl_border_2002 -0.141      0.0169   -8.35   7.10e-17
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.4750185
## 
## $trta_std_err
## [1] 0.1094833
## 
## $trta_std_err_pval
## [1] 1.433082e-05
## 
## $intr
## [1] TRUE

References

Silva, JMC Santos, and Silvana Tenreyro. 2006. “The Log of Gravity.” The Review of Economics and Statistics 88 (4): 641–58.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.".
Yotov, Yoto V, Roberta Piermartini, José-Antonio Monteiro, and Mario Larch. 2016. An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model. World Trade Organization Geneva.
Zumel, Nina. 2014. “Trimming the Fat from Glm() Models in r.” https://win-vector.com/2014/05/30/trimming-the-fat-from-glm-models-in-r/.