Chapter 2 Partial equilibrium trade policy analysis with structural gravity

2.1 Traditional Gravity Estimates

2.1.1 Preparing the data

If the reader has never used R before, please check chapters 1 to 25 from Wickham and Grolemund (2016).

If the reader has only fitted a few regressions in R, without much practice on transforming and cleaning data before, please check chapters 5 and 18 from Wickham and Grolemund (2016).

Please see the note from page 42 in Yotov et al. (2016). It is a really important note, which tells us that we need to:

Filter observations for a range of years (1986, 1990, 1994, 1998, 2002 and 2006)
Transform some variables to logarithm scale (trade and dist) and create new variables from those in the original dataset
Remove cases where both the exporter and the importer are the same
Drop observations where the trade flow is zero

Unlike Yotov et al. (2016), here we shall use a single dataset for all the applications and subset its columns depending on what we need. This decision kept the tradepolicy R package as light as possible.

Step 1, including subsetting columns for this application, is straightforward.

library(tradepolicy)

ch1_application1 <- agtpa_applications %>%
  select(exporter, importer, pair_id, year, trade, dist, cntg, lang, clny) %>%
  filter(year %in% seq(1986, 2006, 4))

For step 2, this can be divided in parts, starting with the log transformation of trade and distance.

ch1_application1 <- ch1_application1 %>%
  mutate(
    log_trade = log(trade),
    log_dist = log(dist)
  )

Continuing step 2, we can now create the variables \(Y_{i,t}\) and \(E_{i,t}\) that appear on the OLS model equation in the book.

ch1_application1 <- ch1_application1 %>%
  # Create Yit
  group_by(exporter, year) %>%
  mutate(
    y = sum(trade),
    log_y = log(y)
  ) %>%

  # Create Eit
  group_by(importer, year) %>%
  mutate(
    e = sum(trade),
    log_e = log(e)
  )

The OLS model with remoteness index needs both exporter and importer index, which grouping variables can create. We divide it into sub-steps: Replicate the computation of total exports, then the remoteness index for exporters, and finally the total imports with the corresponding remoteness index for importers.

ch1_application1 <- ch1_application1 %>%
  # Replicate total_e
  group_by(exporter, year) %>%
  mutate(total_e = sum(e)) %>%
  group_by(year) %>%
  mutate(total_e = max(total_e)) %>%

  # Replicate rem_exp
  group_by(exporter, year) %>%
  mutate(
    remoteness_exp = sum(dist *  total_e / e),
    log_remoteness_exp = log(remoteness_exp)
  ) %>%

  # Replicate total_y
  group_by(importer, year) %>%
  mutate(total_y = sum(y)) %>%
  group_by(year) %>%
  mutate(total_y = max(total_y)) %>%

  # Replicate rem_imp
  group_by(importer, year) %>%
  mutate(
    remoteness_imp = sum(dist / (y / total_y)),
    log_remoteness_imp = log(remoteness_imp)
  )

To create the variables for the OLS with Fixed Effects Model, we followed box #1 on page 44 from Yotov et al. (2016). We combine both exporter and importer variables with the year to create the fixed effects variables.

ch1_application1 <- ch1_application1 %>%
  # This merges the columns exporter/importer with year
  mutate(
    exp_year = paste0(exporter, year),
    imp_year = paste0(importer, year)
  )

The addition of exporter/importer time fixed effects concludes step 2, and now we need to perform step 3.

ch1_application1 <- ch1_application1 %>%
  filter(exporter != importer)

Some cases require conducting step 4, and we will be explicit about it when needed.

2.1.2 OLS estimation ignoring multilateral resistance terms

The general equation for this model is \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \:\beta_6 \log E_{j,t} + \varepsilon_{ij,t}. \end{align} \]

Please see page 41 in Yotov et al. (2016) for full detail of each variable.

The model for this case is straightforward, and in this case, we need to apply step 4 from the previous section to drop cases where the trade is zero.

fit_ols <- feols(
  log_trade ~ log_dist + cntg + lang + clny + log_y + log_e,
  data = filter(ch1_application1, trade > 0)
)

summary(fit_ols)

## OLS estimation, Dep. Var.: log_trade
## Observations: 25,689 
## Standard-errors: IID 
##               Estimate Std. Error   t value   Pr(>|t|)    
## (Intercept) -11.283080   0.151732 -74.36173  < 2.2e-16 ***
## log_dist     -1.001607   0.014159 -70.74094  < 2.2e-16 ***
## cntg          0.573805   0.074427   7.70961 1.3076e-14 ***
## lang          0.801548   0.033748  23.75115  < 2.2e-16 ***
## clny          0.734853   0.070387  10.44025  < 2.2e-16 ***
## log_y         1.190236   0.005402 220.32012  < 2.2e-16 ***
## log_e         0.907588   0.005577 162.72688  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 1.74274   Adj. R2: 0.758469

The employed function, feols(), does not carry a copy of its training data by default besides providing faster fitting for models with fixed effects. This is not the case in base R, where glm() outputs include this data, increasing the model’s size, but this does not affect the model’s predictions and can be changed as the user needs it (Zumel 2014).

The model is almost ready. We only need to stick to the methodology from Yotov et al. (2016) and cluster the standard errors by country pair (see the note on page 42, it is imperative).

fit_ols <- feols(
  log_trade ~ log_dist + cntg + lang + clny + log_y + log_e,
  data = filter(ch1_application1, trade > 0),
  cluster = ~pair_id
)

summary(fit_ols)

## OLS estimation, Dep. Var.: log_trade
## Observations: 25,689 
## Standard-errors: Clustered (pair_id) 
##               Estimate Std. Error   t value   Pr(>|t|)    
## (Intercept) -11.283080   0.295827 -38.14076  < 2.2e-16 ***
## log_dist     -1.001607   0.027340 -36.63526  < 2.2e-16 ***
## cntg          0.573805   0.184710   3.10652 1.9158e-03 ** 
## lang          0.801548   0.082102   9.76286  < 2.2e-16 ***
## clny          0.734853   0.144193   5.09632 3.7405e-07 ***
## log_y         1.190236   0.009456 125.87160  < 2.2e-16 ***
## log_e         0.907588   0.009910  91.58459  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 1.74274   Adj. R2: 0.758469

The tradepolicy package provides functions to provide more informative summaries. Please read the documentation of the package and look for the tp_summary_app_1() function, it summarises the model in the exact way as reported in the book by providing:

Clustered standard errors.
Number of observations.
\(R^2\) (if applicable).
Presence (or absence) of exporter and exporter-time fixed effects.
RESET test p-value.

These statistical results are returned as a list to keep it simple, which we can see for the model in the same format as reported in the book.

tp_summary_app_1(
  formula = "log_trade ~ log_dist + cntg + lang + clny + log_y + log_e",
  data = filter(ch1_application1, trade > 0),
  method = "ols"
)

## $tidy_coefficients
## # A tibble: 7 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)  -11.3       0.296    -38.1    0    
## 2 log_dist      -1.00      0.027    -36.6    0    
## 3 cntg           0.574     0.185      3.11   0.002
## 4 lang           0.802     0.082      9.76   0    
## 5 clny           0.735     0.144      5.10   0    
## 6 log_y          1.19      0.009    126.     0    
## 7 log_e          0.908     0.01      91.6    0    
## 
## $nobs
## [1] 25689
## 
## $rsquared
##    r2 
## 0.759 
## 
## $etfe
## [1] FALSE
## 
## $itfe
## [1] FALSE
## 
## $reset_pval
## [1] 0

Please notice that the summary hides the exporter/importer fixed effects.

2.1.3 OLS estimation controlling for multilateral resistance terms with remote indexes

The remoteness model adds variables to the OLS model. The general equation for this model is \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \beta_6 \log E_{j,t} + \beta_7 \log(REM\_EXP_i,t) + \beta_8 \log(REM\_IMP_i,t) + \varepsilon_{ij,t}. \end{align} \]

In the equation above \(REM\_EXP\) and \(REM\_IMP\) are defined as \[ \begin{align} \log(REM\_EXP_{i,t}) &= \log \left( \sum_j \frac{DIST_{i,j}}{E_{j,t} / Y_t} \right) \text{ and }\\ \log(REM\_IMP_{j,t}) &= \log \left( \sum_i \frac{DIST_{i,j}}{Y_{i,t} / Y_t} \right). \end{align} \]

Please see page 43 in Yotov et al. (2016) for full detail of each variable.

Our approach follows box #1 on page 43 from Yotov et al. (2016). Fitting the regression is straightforward. It is just about adding more regressors to what we did in the last section, and we can create a list with a summary for the model.

tp_summary_app_1(
  formula = "log_trade ~ log_dist + cntg + lang + clny + log_y + log_e +
    log_remoteness_exp + log_remoteness_imp",
  data = filter(ch1_application1, trade > 0),
  method = "ols"
)

## $tidy_coefficients
## # A tibble: 9 × 5
##   term               estimate std.error statistic p.value
##   <chr>                 <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)         -35.2       1.99     -17.7    0    
## 2 log_dist             -1.18      0.031    -37.9    0    
## 3 cntg                  0.247     0.177      1.39   0.164
## 4 lang                  0.739     0.078      9.43   0    
## 5 clny                  0.842     0.15       5.61   0    
## 6 log_y                 1.16      0.009    123.     0    
## 7 log_e                 0.903     0.01      91.1    0    
## 8 log_remoteness_exp    0.972     0.068     14.3    0    
## 9 log_remoteness_imp    0.274     0.06       4.58   0    
## 
## $nobs
## [1] 25689
## 
## $rsquared
##    r2 
## 0.765 
## 
## $etfe
## [1] FALSE
## 
## $itfe
## [1] FALSE
## 
## $reset_pval
## [1] 0

2.1.4 OLS estimation controlling for multilateral resistance terms with fixed effects

The general equation for this model is \[ \begin{align} \log X_{ij,t} =& \: \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\\ \text{ }& \:\beta_4 CLNY_{i,j} + \pi_{i,t} + \chi_{i,t} + \varepsilon_{ij,t}. \end{align} \]

Where the added terms, concerning the OLS model, are \(\pi_{i,t}\) and \(\chi_{i,t}\) that account for exporter-time and importer-time fixed effects, respectively. See page 44 in Yotov et al. (2016) for full detail of each variable.

We can quickly generate a list as we did with the previous models. The only difference to the previous models is that in this case that the variables to the right of the “|” operator are the fixed effects, which are treated differently by the fixest package, which is used internally by the tradepolicy package, for faster model fitting.

Please notice that the summaries intentionally do not show fixed effects, because there are cases where we have thousands of fixed effects.

tp_summary_app_1(
  formula = "log_trade ~ log_dist + cntg + lang + clny | exp_year + imp_year",
  data = filter(ch1_application1, trade > 0),
  method = "ols"
)

## $tidy_coefficients
## # A tibble: 4 × 5
##   term     estimate std.error statistic p.value
##   <chr>       <dbl>     <dbl>     <dbl>   <dbl>
## 1 log_dist   -1.22      0.038    -31.8    0    
## 2 cntg        0.223     0.203      1.1    0.271
## 3 lang        0.661     0.082      8.05   0    
## 4 clny        0.67      0.149      4.49   0    
## 
## $nobs
## [1] 25689
## 
## $rsquared
##    r2 
## 0.843 
## 
## $etfe
## [1] TRUE
## 
## $itfe
## [1] TRUE
## 
## $reset_pval
## [1] 0

There is another difference when we compare feols() or fepois() against glm() in the presence of fixed effects, which we can explain with an example.

In the data used for the previous summary, we have \(T\) years (1986, 1990, 1994, 1998, 2002 and 2006). We could be interested in filtering for a single exporter and a single importer to fit the fixed effects model \[ \log X_{t} = \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \sum_{u} \beta_u FE_u + \varepsilon_{ij,t}, \] where \[ \begin{align*} &u \in \{1986, 1990, 1994, 1998, 2002, 2006\} \text{ and}\cr &FE_u = \begin{cases} 1 & \text {if } t = u \cr 0 & \text{otherwise}. \end{cases} \end{align*} \]

When we use feols(), or any of the functions in the fixest package, a formula of the form \(z \sim x_1 + x_2 \mid y\) will estimate the model described in the previous equations.

If we do the same in base R, with glm(), the equivalent formula would be of the form \(z \sim 0 + x_1 + x_2 + y\), otherwise base R estimates a model of the form \(y_t = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \sum_u y_{u}\), which means that not including a zero in the formula estimates an additional coefficient in the estimation which corresponds to a global intercept or the “grand mean”.

On the \(\beta_0\) constant, it is tough to interpret with many-way fixed effects, and it is advised against reporting it.

2.1.5 PPML estimation controlling for multilateral resistance terms with fixed effects

The general equation for this model is \[ \begin{align} X_{ij,t} =& \:\exp\left[\beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} +\right.\\ \text{ }& \:\left.\beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \pi_{i,t} + \chi_{i,t}\right] \times \varepsilon_{ij,t}. \end{align} \]

The reason to compute this model, despite the lower speed compared to OLS, is that PPML is the only estimator perfectly consistent with the theoretical gravity model. By estimating with PPML, the fixed effects correspond precisely to the corresponding theoretical terms.

The data for this model is the same as for the fixed effects model, and one option in R is to use the fepois() function.

fit_ppml <- fepois(trade ~ log_dist + cntg + lang + clny | exp_year + imp_year,
  data = ch1_application1,
  cluster = ~pair_id
)

Exactly as it was mentioned for feols(), fepois() shares the same differences regarding glm() objects.

If the reader decides to run this model and print the summary, they will notice that it does not report an \(R^2\) and displays an extensive fixed effect list. For PPML models, the \(R^2\) needs to be computed afterwards as Kendall’s correlation (or rank correlation) between the observed and predicted values. Please see Silva and Tenreyro (2006) for the details and the RESET test for PPML (GLM) models.

Beware that software such as Stata requires additional libraries such as ppmlhdfe to report a correct \(R^2\) for the PPML model. What Stata shows is a reported pseudo-\(R^2\). To construct a proper \(R^2\) in R, tp_summary_app_1() takes the rank correlation between actual and predicted trade flows.

We can obtain a detailed list as in the previous examples.

tp_summary_app_1(
  formula = "trade ~ log_dist + cntg + lang + clny | exp_year + imp_year",
  data = ch1_application1,
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 4 × 5
##   term     estimate std.error statistic p.value
##   <chr>       <dbl>     <dbl>     <dbl>   <dbl>
## 1 log_dist   -0.841     0.032    -26.2    0    
## 2 cntg        0.437     0.084      5.18   0    
## 3 lang        0.247     0.078      3.18   0.001
## 4 clny       -0.222     0.118     -1.89   0.059
## 
## $nobs
## [1] 28152
## 
## $rsquared
## [1] 0.586
## 
## $etfe
## [1] TRUE
## 
## $itfe
## [1] TRUE
## 
## $reset_pval
## [1] 0.647

2.2 The “distance puzzle” resolved

2.2.1 Preparing the data

Please see the note from page 47 in Yotov et al. (2016). We need to proceed with similar steps as in the previous section.

The distance puzzle proposes the gravity specification \[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j}\right]\times\\ \text{ }& \:\exp\left[\beta_4 CLNY_{i,j} + \beta_5 \log(DIST\_INTRA_{i,i})\right] \times \varepsilon_{ij,t}. \end{align} \]

The difference concerning the last section is that now we need to separate the distance variable into multiple columns that account for discrete-time effects. The \(\beta_T\) terms of the equation reflect this. Perhaps the easiest option is to transform the year into a text column and then use the pivot_wider() function.

We need to remove cases where the exporter is the same as the importer and cases where trade is zero for the OLS model. For the PPML models, we need to mark rows where the exporter and the importer are the same, and we need to create the same country column, which is also required to transform the distance variables as shown in box #1 in page 48 from Yotov et al. (2016).

In order to avoid creating two very similar datasets, we shall create one dataset to cover both OLS and PPML.

ch1_application2 <- agtpa_applications %>%
  select(exporter, importer, pair_id, year, trade, dist, cntg, lang, clny) %>%
  # this filter covers both OLS and PPML
  filter(year %in% seq(1986, 2006, 4)) %>%
  mutate(
    # variables for both OLS and PPML
    exp_year = paste0(exporter, year),
    imp_year = paste0(importer, year),
    year = paste0("log_dist_", year),
    log_trade = log(trade),
    log_dist = log(dist),
    smctry = ifelse(importer != exporter, 0, 1),

    # PPML specific variables
    log_dist_intra = log_dist * smctry,
    intra_pair = ifelse(exporter == importer, exporter, "inter")
  ) %>%
  pivot_wider(names_from = year, values_from = log_dist, values_fill = 0) %>%
  mutate(across(log_dist_1986:log_dist_2006, function(x) x * (1 - smctry)))

The across() function is a shortcut to avoid repetition, as in the following example, we show it for reference without computation.

ch1_application2 %>%
  mutate(
    log_dist_1986 =  log_dist_1986 * (1 - smctry),
    log_dist_1990 =  log_dist_1990 * (1 - smctry),
    
    # repeat log_dist_T many_times for T = 1994, 1998, ...
    
    log_dist_2006 =  log_dist_2006 * (1 - smctry)
  )

Note that the OLS model shall require filtering when we specify the model because we skipped filtering the cases where trade is equal to zero and both the importer and the exporter are the same. Because the solution for the “distance puzzle” implies different transformations and filters for the OLS and PPML cases, one possibility is to filter in the same summary functions.

2.2.2 OLS solution for the “distance puzzle”

The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.

With the data from above, the model specification is straightforward.

tp_summary_app_2(
  formula = "log_trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 +
    log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny | 
    exp_year + imp_year",
  data = filter(ch1_application2, importer != exporter, trade > 0),
  method = "ols"
)

## $tidy_coefficients
## # A tibble: 9 × 5
##   term          estimate std.error statistic p.value
##   <chr>            <dbl>     <dbl>     <dbl>   <dbl>
## 1 log_dist_1986   -1.17      0.044    -26.8    0    
## 2 log_dist_1990   -1.16      0.042    -27.3    0    
## 3 log_dist_1994   -1.21      0.046    -26.5    0    
## 4 log_dist_1998   -1.25      0.043    -29.2    0    
## 5 log_dist_2002   -1.24      0.044    -28.1    0    
## 6 log_dist_2006   -1.26      0.044    -28.9    0    
## 7 cntg             0.223     0.203      1.1    0.271
## 8 lang             0.661     0.082      8.06   0    
## 9 clny             0.67      0.149      4.49   0    
## 
## $nobs
## [1] 25689
## 
## $pct_chg_log_dist
## [1] 7.95
## 
## $pcld_std_err
## [1] 3.698
## 
## $pcld_std_err_pval
## [1] 0.032
## 
## $intr
## [1] FALSE
## 
## $csfe
## [1] FALSE

2.2.3 PPML solution for the “distance puzzle”

This model is very similar to the one specified in the PPML section from the last section. We can directly fit the model.

tp_summary_app_2(
  formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 + log_dist_1994 + 
    log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny | 
    exp_year + imp_year",
  data = filter(ch1_application2, importer != exporter),
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 9 × 5
##   term          estimate std.error statistic p.value
##   <chr>            <dbl>     <dbl>     <dbl>   <dbl>
## 1 log_dist_1986   -0.859     0.038    -22.8    0    
## 2 log_dist_1990   -0.834     0.038    -21.8    0    
## 3 log_dist_1994   -0.835     0.036    -23.4    0    
## 4 log_dist_1998   -0.847     0.036    -23.6    0    
## 5 log_dist_2002   -0.848     0.032    -26.4    0    
## 6 log_dist_2006   -0.836     0.032    -26.3    0    
## 7 cntg             0.437     0.084      5.18   0    
## 8 lang             0.248     0.078      3.18   0.001
## 9 clny            -0.222     0.118     -1.88   0.06 
## 
## $nobs
## [1] 28152
## 
## $pct_chg_log_dist
## [1] -2.75
## 
## $pcld_std_err
## [1] 3.004
## 
## $pcld_std_err_pval
## [1] 0.36
## 
## $intr
## [1] FALSE
## 
## $csfe
## [1] FALSE

2.2.4 Internal distance solution for the “distance puzzle”

This model requires us to add the internal distance variable to the PPML model and not filter the rows where the exporter and the importer are the same.

tp_summary_app_2(
  formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 + log_dist_1994 + 
    log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny + 
    log_dist_intra | exp_year + imp_year",
  data = ch1_application2,
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 10 × 5
##    term           estimate std.error statistic p.value
##    <chr>             <dbl>     <dbl>     <dbl>   <dbl>
##  1 log_dist_1986    -0.98      0.073   -13.4     0    
##  2 log_dist_1990    -0.94      0.074   -12.7     0    
##  3 log_dist_1994    -0.915     0.073   -12.5     0    
##  4 log_dist_1998    -0.887     0.072   -12.3     0    
##  5 log_dist_2002    -0.884     0.072   -12.3     0    
##  6 log_dist_2006    -0.872     0.072   -12.1     0    
##  7 cntg              0.371     0.142     2.62    0.009
##  8 lang              0.337     0.171     1.98    0.048
##  9 clny              0.019     0.159     0.121   0.904
## 10 log_dist_intra   -0.488     0.102    -4.78    0    
## 
## $nobs
## [1] 28566
## 
## $pct_chg_log_dist
## [1] -10.965
## 
## $pcld_std_err
## [1] 1.058
## 
## $pcld_std_err_pval
## [1] 0
## 
## $intr
## [1] TRUE
## 
## $csfe
## [1] FALSE

2.2.5 Internal distance and home bias solution for the “distance puzzle”

This model requires us to add the same country variable to the internal distance model and repeat the rest of the steps from the last section.

tp_summary_app_2(
  formula = "trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 + 
    log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny + 
    log_dist_intra + smctry | exp_year + imp_year",
  data = ch1_application2,
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 11 × 5
##    term           estimate std.error statistic p.value
##    <chr>             <dbl>     <dbl>     <dbl>   <dbl>
##  1 log_dist_1986    -0.857     0.064   -13.5     0    
##  2 log_dist_1990    -0.819     0.064   -12.7     0    
##  3 log_dist_1994    -0.796     0.064   -12.3     0    
##  4 log_dist_1998    -0.77      0.064   -12.0     0    
##  5 log_dist_2002    -0.767     0.064   -12.1     0    
##  6 log_dist_2006    -0.754     0.063   -12.0     0    
##  7 cntg              0.574     0.157     3.64    0    
##  8 lang              0.352     0.139     2.53    0.011
##  9 clny              0.027     0.127     0.212   0.832
## 10 log_dist_intra   -0.602     0.111    -5.44    0    
## 11 smctry            1.69      0.582     2.90    0.004
## 
## $nobs
## [1] 28566
## 
## $pct_chg_log_dist
## [1] -11.969
## 
## $pcld_std_err
## [1] 1.173
## 
## $pcld_std_err_pval
## [1] 0
## 
## $intr
## [1] TRUE
## 
## $csfe
## [1] FALSE

2.2.6 Fixed effects solution for the “distance puzzle”

This model requires us to remove the internal distance and same country variables from the last model and include the internal pair variable to account for the intra-national fixed effects.

tp_summary_app_2(
  formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 + log_dist_1994 + 
    log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny + 
    intra_pair | exp_year + imp_year",
  data = ch1_application2,
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 78 × 5
##    term          estimate std.error statistic p.value
##    <chr>            <dbl>     <dbl>     <dbl>   <dbl>
##  1 log_dist_1986   -0.91      0.033   -27.7     0    
##  2 log_dist_1990   -0.879     0.033   -27.0     0    
##  3 log_dist_1994   -0.86      0.032   -26.6     0    
##  4 log_dist_1998   -0.833     0.032   -25.9     0    
##  5 log_dist_2002   -0.829     0.033   -25.5     0    
##  6 log_dist_2006   -0.811     0.033   -24.9     0    
##  7 cntg             0.442     0.083     5.33    0    
##  8 lang             0.241     0.077     3.11    0.002
##  9 clny            -0.22      0.118    -1.86    0.063
## 10 intra_pairARG    0.384     0.577     0.666   0.505
## # … with 68 more rows
## # ℹ Use `print(n = ...)` to see more rows
## 
## $nobs
## [1] 28566
## 
## $pct_chg_log_dist
## [1] -10.931
## 
## $pcld_std_err
## [1] 0.769
## 
## $pcld_std_err_pval
## [1] 0
## 
## $intr
## [1] TRUE
## 
## $csfe
## [1] TRUE

2.3 Regional trade agreements effects

2.3.1 Preparing the data

This model specification includes gravity covariates, including importer-time and exporter-time fixed effects, as in the equation

\[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\right.\\ \text{ }& \:\left.\beta_4 CLNY_{i,j} + \beta_5 RTA_{ij,t}\right] \times \varepsilon_{ij,t}. \end{align} \]

In comparison to the previous examples, we need to create additional variables to include fixed effects that account for the observations where the exporter and the importer are the same. These variables are internal border, internal dyad and internal borders for different years.

The direct way of obtaining the desired variables is similar to what we did in the previous sections.

ch1_application3 <- agtpa_applications %>%
  filter(year %in% seq(1986, 2006, 4)) %>%
  mutate(
    exp_year = paste0(exporter, year),
    imp_year = paste0(importer, year),
    year = paste0("intl_border_", year),
    log_trade = log(trade),
    log_dist = log(dist),
    intl_brdr = ifelse(exporter == importer, pair_id, "inter"),
    intl_brdr_2 = ifelse(exporter == importer, 0, 1),
    pair_id_2 = ifelse(exporter == importer, "0-intra", pair_id)
  ) %>%
  pivot_wider(names_from = year, values_from = intl_brdr_2, values_fill = 0)

Notice that we used “0-intra” and not just “intra” because the rest of the observations in the internal dyads are numbers 1, …, N, and R internals shall consider “0-intra” as the reference factor for being the first item when it orders the unique observations alphabetically. Also, observe the order of the resulting table, the pivoting of the table will put “0-intra” as the first row for the first exporter-importer dyad. This makes the difference between the expected or other behaviours in the next chapter.

In addition, we need to create the variable containing the trade sum to filter the cases where the sum by dyad is zero.

ch1_application3 <- ch1_application3 %>%
  group_by(pair_id) %>%
  mutate(sum_trade = sum(trade)) %>%
  ungroup()

2.3.2 OLS standard RTA estimates with international trade only

The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.

With the data from above, the model specification is straightforward.

tp_summary_app_3(
  formula = "log_trade ~ log_dist + cntg + lang + clny + rta | exp_year + 
    imp_year",
  data = filter(ch1_application3, trade > 0, importer != exporter),
  method = "ols"
)

## $tidy_coefficients
## # A tibble: 5 × 5
##   term     estimate std.error statistic p.value
##   <chr>       <dbl>     <dbl>     <dbl>   <dbl>
## 1 log_dist   -1.22      0.039   -31.2     0    
## 2 cntg        0.223     0.203     1.10    0.272
## 3 lang        0.661     0.082     8.04    0    
## 4 clny        0.67      0.149     4.49    0    
## 5 rta        -0.004     0.054    -0.081   0.935
## 
## $nobs
## [1] 25689
## 
## $total_rta_effect
## [1] -0.004
## 
## $trta_std_err
## [1] 0.053
## 
## $trta_std_err_pval
## [1] 0.934
## 
## $intr
## [1] FALSE

2.3.3 PPML standard RTA estimates with international trade only

The model specification is very similar to OLS, and we only need to change the method specified in the function.

tp_summary_app_3(
  formula = "trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year",
  data = filter(ch1_application3, importer != exporter),
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 5 × 5
##   term     estimate std.error statistic p.value
##   <chr>       <dbl>     <dbl>     <dbl>   <dbl>
## 1 log_dist   -0.822     0.031    -26.1    0    
## 2 cntg        0.416     0.084      4.94   0    
## 3 lang        0.25      0.078      3.21   0.001
## 4 clny       -0.205     0.116     -1.77   0.077
## 5 rta         0.191     0.067      2.86   0.004
## 
## $nobs
## [1] 28152
## 
## $total_rta_effect
## [1] 0.191
## 
## $trta_std_err
## [1] 0.066
## 
## $trta_std_err_pval
## [1] 0.004
## 
## $intr
## [1] FALSE

2.3.4 Addressing potential domestic trade diversion

The model specification is quite the same as PPML. We only need to add the international border variable but use the entire dataset instead of removing rows where the importer and the exporter are the same.

tp_summary_app_3(
  formula = "trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year +
    intl_brdr",
  data = ch1_application3,
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 5 × 5
##   term     estimate std.error statistic p.value
##   <chr>       <dbl>     <dbl>     <dbl>   <dbl>
## 1 log_dist   -0.8       0.031    -26.0    0    
## 2 cntg        0.393     0.08       4.91   0    
## 3 lang        0.244     0.078      3.11   0.002
## 4 clny       -0.182     0.115     -1.58   0.114
## 5 rta         0.409     0.07       5.84   0    
## 
## $nobs
## [1] 28566
## 
## $total_rta_effect
## [1] 0.409
## 
## $trta_std_err
## [1] 0.069
## 
## $trta_std_err_pval
## [1] 0
## 
## $intr
## [1] FALSE

2.3.5 Addressing potential endogeneity of RTAs

The model specification includes the RTA variable and the exporter-time, importer-time and internal dyad fixed effects to account for domestic trade.

tp_summary_app_3(
  formula = "trade ~ rta | exp_year + imp_year + pair_id_2",
  data = filter(ch1_application3, sum_trade > 0),
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 1 × 5
##   term  estimate std.error statistic p.value
##   <chr>    <dbl>     <dbl>     <dbl>   <dbl>
## 1 rta      0.557     0.108      5.14       0
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.557
## 
## $trta_std_err
## [1] 0.102
## 
## $trta_std_err_pval
## [1] 0
## 
## $intr
## [1] FALSE

2.3.6 Testing for potential “reverse causality” between trade and RTAs

We need to modify the previous model to include the forward lagged RTA variable (by four years) and consider where the trade sum is larger than zero.

tp_summary_app_3(
  formula = "trade ~ rta + rta_lead4 | exp_year + imp_year + pair_id_2",
  data = filter(ch1_application3, sum_trade > 0),
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 2 × 5
##   term      estimate std.error statistic p.value
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>
## 1 rta          0.52      0.091     5.71    0    
## 2 rta_lead4    0.077     0.098     0.792   0.428
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.597
## 
## $trta_std_err
## [1] 0.138
## 
## $trta_std_err_pval
## [1] 0
## 
## $intr
## [1] FALSE

2.3.7 Addressing potential non-linear and phasing-in effects of RTAs

Instead of future-lagged RTA variables, as in the previous model, we modify the previous model and include the RTA backwards lagged variables instead.

tp_summary_app_3(
  formula = "trade ~ rta + rta_lag4 + rta_lag8 + rta_lag12 | exp_year + 
    imp_year + pair_id_2",
  data = filter(ch1_application3, sum_trade > 0),
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 4 × 5
##   term      estimate std.error statistic p.value
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>
## 1 rta          0.291     0.095      3.08   0.002
## 2 rta_lag4     0.414     0.071      5.80   0    
## 3 rta_lag8     0.169     0.046      3.69   0    
## 4 rta_lag12    0.119     0.032      3.73   0    
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.993
## 
## $trta_std_err
## [1] 0.094
## 
## $trta_std_err_pval
## [1] 0
## 
## $intr
## [1] FALSE

2.3.8 Addressing globalization effects

In addition to the previous model, we include the international borders on different years besides the lagged RTAs.

tp_summary_app_3(
  formula = "trade ~ rta + rta_lag4 + rta_lag8 + rta_lag12 + intl_border_1986 + 
    intl_border_1990 + intl_border_1994 + intl_border_1998 + intl_border_2002 |
    exp_year + imp_year + pair_id_2",
  data = filter(ch1_application3, sum_trade > 0),
  method = "ppml"
)

## $tidy_coefficients
## # A tibble: 9 × 5
##   term             estimate std.error statistic p.value
##   <chr>               <dbl>     <dbl>     <dbl>   <dbl>
## 1 rta                 0.116     0.092     1.26    0.209
## 2 rta_lag4            0.288     0.065     4.40    0    
## 3 rta_lag8            0.069     0.051     1.36    0.175
## 4 rta_lag12           0.002     0.031     0.076   0.939
## 5 intl_border_1986   -0.706     0.051   -13.9     0    
## 6 intl_border_1990   -0.48      0.046   -10.5     0    
## 7 intl_border_1994   -0.367     0.035   -10.3     0    
## 8 intl_border_1998   -0.158     0.025    -6.40    0    
## 9 intl_border_2002   -0.141     0.018    -7.87    0    
## 
## $nobs
## [1] 28482
## 
## $total_rta_effect
## [1] 0.475
## 
## $trta_std_err
## [1] 0.109
## 
## $trta_std_err_pval
## [1] 0
## 
## $intr
## [1] FALSE

References

Silva, JMC Santos, and Silvana Tenreyro. 2006. “The Log of Gravity.” The Review of Economics and Statistics 88 (4): 641–58.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.".

Yotov, Yoto V, Roberta Piermartini, José-Antonio Monteiro, and Mario Larch. 2016. An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model. World Trade Organization Geneva.

Zumel, Nina. 2014. “Trimming the Fat from Glm() Models in r.” https://win-vector.com/2014/05/30/trimming-the-fat-from-glm-models-in-r/.