IV Estimates via GMM with Clustering in R

In econometrics, generalized method of moments (GMM) is one estimation methodology that can be used to calculate instrumental variable (IV) estimates. Performing this calculation in R, for a linear IV model, is trivial. One simply uses the gmm() function in the excellent gmm package like an lm() or ivreg() function. The gmm() function will estimate the regression and return model coefficients and their standard errors. An interesting feature of this function, and GMM estimators in general, is that they contain a test of over-identification, often dubbed Hansen’s J-test, as an inherent feature. Therefore, in cases where the researcher is lucky enough to have more instruments than endogenous regressors, they should examine this over-identification test post-estimation.

While the gmm() function in R is very flexible, it does not (yet) allow the user to estimate a GMM model that produces standard errors and an over-identification test that is corrected for clustering. Thankfully, the gmm() function is flexible enough to allow for a simple hack that works around this small shortcoming. For this, I have created a function called gmmcl(), and you can find the code below. This is a function for a basic linear IV model. This code uses the gmm() function to estimate both steps in a two-step feasible GMM procedure. The key to allowing for clustering is to adjust the weights matrix after the second step. Interested readers can find more technical details regarding this approach here. After defining the function, I show a simple application in the code below.

gmmcl = function(formula1, formula2, data, cluster){
  library(plyr) ; library(gmm)
  # create data.frame
  data$id1 = 1:dim(data)[1]
  formula3 = paste(as.character(formula1)[3],"id1", sep=" + ")
  formula4 = paste(as.character(formula1)[2], formula3, sep=" ~ ")
  formula4 = as.formula(formula4)
  formula5 = paste(as.character(formula2)[2],"id1", sep=" + ")
  formula6 = paste(" ~ ", formula5, sep=" ")
  formula6 = as.formula(formula6)
  frame1 = model.frame(formula4, data)
  frame2 = model.frame(formula6, data)
  dat1 = join(data, frame1, type="inner", match="first")
  dat2 = join(dat1, frame2, type="inner", match="first")
  # matrix of instruments
  Z1 = model.matrix(formula2, dat2)
  # step 1
  gmm1 = gmm(formula1, formula2, data = dat2, 
             vcov="TrueFixed", weightsMatrix = diag(dim(Z1)[2]))
  # clustering weight matrix
  cluster = factor(dat2[,cluster])
  u = residuals(gmm1)
  estfun = sweep(Z1, MARGIN=1, u,'*')
  u = apply(estfun, 2, function(x) tapply(x, cluster, sum))  
  S = 1/(length(residuals(gmm1)))*crossprod(u)
  # step 2
  gmm2 = gmm(formula1, formula2, data=dat2, 
             vcov="TrueFixed", weightsMatrix = solve(S))

# generate data.frame
n = 100
z1 = rnorm(n)
z2 = rnorm(n)
x1 = z1 + z2 + rnorm(n)
y1 = x1 + rnorm(n)
id = 1:n

data = data.frame(z1 = c(z1, z1), z2 = c(z2, z2), x1 = c(x1, x1),
                  y1 = c(y1, y1), id = c(id, id))

summary(gmmcl(y1 ~ x1, ~ z1 + z2, data = data, cluster = "id"))