In econometrics, generalized method of moments (GMM) is one estimation methodology that can be used to calculate instrumental variable (IV) estimates. Performing this calculation in R, for a linear IV model, is trivial. One simply uses the gmm() function in the excellent gmm package like an lm() or ivreg() function. The gmm() function will estimate the regression and return model coefficients and their standard errors. An interesting feature of this function, and GMM estimators in general, is that they contain a test of over-identification, often dubbed Hansen’s J-test, as an inherent feature. Therefore, in cases where the researcher is lucky enough to have more instruments than endogenous regressors, they should examine this over-identification test post-estimation.

While the gmm() function in R is very flexible, it does not (yet) allow the user to estimate a GMM model that produces standard errors and an over-identification test that is corrected for clustering. Thankfully, the gmm() function is flexible enough to allow for a simple hack that works around this small shortcoming. For this, I have created a function called gmmcl(), and you can find the code below. This is a function for a basic linear IV model. This code uses the gmm() function to estimate both steps in a two-step feasible GMM procedure. The key to allowing for clustering is to adjust the weights matrix after the second step. Interested readers can find more technical details regarding this approach here. After defining the function, I show a simple application in the code below.

gmmcl = function(formula1, formula2, data, cluster){ library(plyr) ; library(gmm) # create data.frame data$id1 = 1:dim(data)[1] formula3 = paste(as.character(formula1)[3],"id1", sep=" + ") formula4 = paste(as.character(formula1)[2], formula3, sep=" ~ ") formula4 = as.formula(formula4) formula5 = paste(as.character(formula2)[2],"id1", sep=" + ") formula6 = paste(" ~ ", formula5, sep=" ") formula6 = as.formula(formula6) frame1 = model.frame(formula4, data) frame2 = model.frame(formula6, data) dat1 = join(data, frame1, type="inner", match="first") dat2 = join(dat1, frame2, type="inner", match="first") # matrix of instruments Z1 = model.matrix(formula2, dat2) # step 1 gmm1 = gmm(formula1, formula2, data = dat2, vcov="TrueFixed", weightsMatrix = diag(dim(Z1)[2])) # clustering weight matrix cluster = factor(dat2[,cluster]) u = residuals(gmm1) estfun = sweep(Z1, MARGIN=1, u,'*') u = apply(estfun, 2, function(x) tapply(x, cluster, sum)) S = 1/(length(residuals(gmm1)))*crossprod(u) # step 2 gmm2 = gmm(formula1, formula2, data=dat2, vcov="TrueFixed", weightsMatrix = solve(S)) return(gmm2) } # generate data.frame n = 100 z1 = rnorm(n) z2 = rnorm(n) x1 = z1 + z2 + rnorm(n) y1 = x1 + rnorm(n) id = 1:n data = data.frame(z1 = c(z1, z1), z2 = c(z2, z2), x1 = c(x1, x1), y1 = c(y1, y1), id = c(id, id)) summary(gmmcl(y1 ~ x1, ~ z1 + z2, data = data, cluster = "id"))

Great post. I also enjoyed reading your other posts on modeling endogeneity in R. By any chance, do you know how to address endogeneity issues in a random-coefficient model aka multilevel regression)?

Hello, I think there is an application in the Gelman-Hill book. A topic possibly worth a future blog post perhaps!

Thanks for this article, Alan. I’ve heard a lot of (the few) economists using R complain about how it is not straightforward to cluster in R. Did you contact Pierre (the maintainer of the gmm package) to see if support could be added? Also, since you’re speaking to an R crowd I would note that you might want to use “clustered standard errors” rather than “clustering” as I think most R users immediately think of cluster analysis. I look forward to any future posts you might have on clustered standard errors because this would make it easier for economists to start using R.

Thanks for the comment Scott. Perhaps support could be added to the gmm package. However, this would be the package maintainer’s decision. The gmm package is already very comprehensive, as well as being well documented, and I am not sure if the maintainer wants to change anything. I note your comment on distinguishing between clustering and clustered standard errors.

Thanks for the quick reply, Alan.

Pingback: Momento R do Dia – duas dicas | De Gustibus Non Est Disputandum