In April 2012, I wrote this blog post demonstrating an approach proposed in Lewbel (2012) that identifies endogenous regressor coefficients in a linear triangular system. Now I am happy to announce the release of the ivlewbel package, which contains a function through which Lewbel’s method can be applied in R. This package is now available to download on the CRAN.

Please see the example from the previous blog post replicated in the below. Additionally, it would be very helpful if people could comment on bugs and additional features they would like to add to the package. My contact details are in the about section of the blog.

library(ivlewbel)
beta1 <- beta2 <- NULL
for(k in 1:500){
#generate data (including intercept)
x1 <- rnorm(1000,0,1)
x2 <- rnorm(1000,0,1)
u <- rnorm(1000,0,1)
s1 <- rnorm(1000,0,1)
s2 <- rnorm(1000,0,1)
ov <- rnorm(1000,0,1)
e1 <- u + exp(x1)*s1 + exp(x2)*s1
e2 <- u + exp(-x1)*s2 + exp(-x2)*s2
y1 <- 1 + x1 + x2 + ov + e2
y2 <- 1 + x1 + x2 + y1 + 2*ov + e1
x3 <- rep(1,1000)
dat <- data.frame(y1,y2,x3,x1,x2)
#record ols estimate
beta1 <- c(beta1,coef(lm(y2~x1+x2+y1))[4])
#init values for iv-gmm
beta2 <- c(beta2,lewbel(formula = y2 ~ y1 | x1 + x2 | x1 + x2, data = dat)$coef.est[1,1])
}
library(sm)
d <- data.frame(rbind(cbind(beta1,"OLS"),cbind(beta2,"IV-GMM")))
d$beta1 <- as.numeric(as.character(d$beta1))
sm.density.compare(d$beta1, d$V2,xlab=("Endogenous Coefficient"))
title("Lewbel and OLS Estimates")
legend("topright", levels(d$V2),lty=c(1,2,3),col=c(2,3,4),bty="n")
abline(v=1)

### Like this:

Like Loading...

*Related*

Thank you for coding it up as a package. I have a question about the use of gmm package in your package. In the GMM package, there seems to be no option for White’s robust variance estimator without also allowing for autocorrelation when you input a formula. But in cross-sectional data, it increases variance unnecessarily if we allow for such autocorrelation, and it may be harmful if the original data is sorted in some ways that induce false autocorrelation. How should we get around with this?

Thanks for the comment. Do you have a reference that shows HAC are unnecessarily inflated when compared to HC in cross-sections?

A simple work around would be to use the clustered standard errors option and just cluster on each observation. That will be equivalent to the White formulation.

Thank you for your quick response. I have no specific paper at hand about whether it must be an inflation of standard error in this case. But intuitively, if the correlation is zero but you estimate it, it introduces more noise and so it makes the estimator more noisy. (e.g. GMM can be in finite sample less efficient than 2SLS under homoscedasticity if you use robust variance matrix for the weight matrixm in GMM.) Also, as I said, if the data is sorted in some ways for other purposes, for example, sorting the dependent variable in ascending or descending order, it may show autocorrelation but actually there is no, which would then invalidate any estimators based on it. And so it is a bit dangerous to use HAC directly.

I guess another way to get around is to directly input the moment conditions, not the equations, and then use vcov=iid.

Maybe I shouldn’t say the results are invalid for the spurious autocorrelation case. Any weight matrix will lead to a consistent GMM estimator. Only that it is not efficient and the reported s.e. are wrong.

BTW, is the GMM estimator used a 2-step estimator? If so, what is the first step used to construct the 2nd step weight matrix? Thank you very much!

Thanks for the input. I will revise the package when I get the time. In the mean time you can get the usual robust s.e’s by clustering on each observation.

All estimators are 2-step. First step uses W=I, the second W=(GG’)^-1, as is usually done.

The package has been updated. Robust now refers to the White correction.

Thanks for this package! I am not entirely clear about the appropriate heteroskedasticity test.

For a triangular system, heteroskedastic is only necessary for the endogenous regressor (y1 in your simulation). For fully simultaneous systems, heteroskedastic is necessary for y2 as well. Your simulation actually generates heteroskedastic errors for both although the system is triangular. So using e1 <- u + s1 works as well. Is that correct?

Now what is the correct way to test for heteroskedastic? Given one of your simulation runs:

m <- lm(y1 ~ x1 + x2, data=df)

bptest(m, ~ x1*x2 + I(x1^2) + I(x2^2), data = df)

# white test as special case of Breusch-Pagan test

Is that the correct test? Or should I include other variables as well? When my earlier comment is correct, the same test for y2 is irrelevant for triangular system.

I think it would be nice if the lewbel function automatically tests for heteroskedastic. I know I can do that myself but it would be nice for the user.

Thanks!

Hi Greg,

Thanks for the comment. Here is a simple reply.

“So using e1 <- u + s1 works as well. Is that correct?"

Yes

"Now what is the correct way to test for heteroskedastic?"

Any test you want. I prefer a more adhoc White method. Run the first stage, get the residuals and then regress the squared residuals on the X variables and see what ones are important.

Hope this helps.

I have used your package to perform a comparison between Lewbel (2012) estimator with Klein and Vella (2010) estimator, which also identify parameters through heteroscedasticity.

https://ideas.repec.org/p/pra/mprapa/65888.html

One result worth noticing is that the assumption on the form of heteroscedasticity is not innocuous. One should justify apriori or use some model selection approach to choose a better one.