The ivlewbel Package. A new way to Tackle Endogenous Regressor Models.

In April 2012, I wrote this blog post demonstrating an approach proposed in Lewbel (2012) that identifies endogenous regressor coefficients in a linear triangular system. Now I am happy to announce the release of the ivlewbel package, which contains a function through which Lewbel’s method can be applied in R. This package is now available to download on the CRAN.

Please see the example from the previous blog post replicated in the below. Additionally, it would be very helpful if people could comment on bugs and additional features they would like to add to the package. My contact details are in the about section of the blog.

lewbel2

library(ivlewbel)

beta1 <- beta2 <- NULL
for(k in 1:500){
  #generate data (including intercept)
  x1 <- rnorm(1000,0,1)
  x2 <- rnorm(1000,0,1)
  u <- rnorm(1000,0,1)
  s1 <- rnorm(1000,0,1)
  s2 <- rnorm(1000,0,1)
  ov <- rnorm(1000,0,1)
  e1 <- u + exp(x1)*s1 + exp(x2)*s1
  e2 <- u + exp(-x1)*s2 + exp(-x2)*s2
  y1 <- 1 + x1 + x2 + ov + e2 
  y2 <- 1 + x1 + x2 + y1 + 2*ov + e1
  x3 <- rep(1,1000)
  dat <- data.frame(y1,y2,x3,x1,x2)
  
  #record ols estimate
  beta1 <- c(beta1,coef(lm(y2~x1+x2+y1))[4])
  #init values for iv-gmm
  beta2 <- c(beta2,lewbel(formula = y2 ~ y1 | x1 + x2 | x1 + x2, data = dat)$coef.est[1,1])
}

library(sm)
d <- data.frame(rbind(cbind(beta1,"OLS"),cbind(beta2,"IV-GMM")))
d$beta1 <- as.numeric(as.character(d$beta1))
sm.density.compare(d$beta1, d$V2,xlab=("Endogenous Coefficient"))
title("Lewbel and OLS Estimates")
legend("topright", levels(d$V2),lty=c(1,2,3),col=c(2,3,4),bty="n")
abline(v=1)

Heteroskedastic GLM in R

A commenter on my previous blog entry has drawn my attention to an R function called hetglm() that estimates heteroskedastic probit models. This function is contained in the glmx package. The glmx package is not available on CRAN yet, but thankfully can be downloaded here.

The hetglm() function has a number of computational advantages compared with the crude method outlined in my previous post. The following example replicates the previous analysis showing the speed advantage associated with using the hetglm() function.

rm(list=ls()) # clear ws
library(maxLik) 
library(glmx)
n <- 1000 # no. obs
x1 <- runif(n,-1,1) # predictor 1
x2 <- runif(n,-1,1) # " 2
e1 <- rnorm(n,0,1) # normal error
e2 <- (1 + 0.45*(x1+x2))*e1 # hetero error
y <- ifelse(0.5 + 0.5*x1 -0.5*x2 - e2 >0, 1, 0) # outcome
# estimate normal probit
r1 <- glm(y~x1+x2,family=binomial(link="probit")) 
system.time(ml1 <- maxLik(hll,start=c(0,0,0,0,0))) # maximize
system.time(h1 <- hetglm(y ~ x1 + x2))
# output
> system.time(ml1 <- maxLik(hll,start=c(0,0,0,0,0))) # maximize
   user  system elapsed 
   4.43    0.00    4.59 
> system.time(h1 <- hetglm(y ~ x1 + x2))
   user  system elapsed 
   0.11    0.00    0.11 

The Heteroskedastic Probit Model

Specification testing is an important part of econometric practice. However, from what I can see, few researchers perform heteroskedasticity tests after estimating probit/logit models. This is not a trivial point. Heteroskedasticity in these models can represent a major violation of the probit/logit specification, both of which assume homoskedastic errors.

Thankfully, tests for heteroskedasticity in these models exist, and it is also possible to estimate modified binary choice models that are robust to heteroskedastic errors. In this blog post I present an example of how to estimate a heteroskedastic probit in R, and also test for heteroskedasticity.

The standard probit model assumes that the error distribution of the latent model has a unit variance. The heteroskedastic probit model relaxes this assumption, and allows the error variance to depend on some of the predictors in the regression model. Those interested in further details of this model, and the potential implications of this form of model misspecification, should consult these notes.

In the code below, I simulate some data, specify the log-likelihood function for the heteroskedastic probit model, estimate this model via maximum likelihood, and then perform a simple LR test of homoskedasticity. Note the log-likelihood function can be simplified from:

\ln L (\beta, \gamma | X_{i}, Z_{i}) = \sum^{N}_{i=1} \{ Y_{i} \ln \Phi [X_{i}\beta \exp(-Z_{i}\gamma)] + (1-Y_{i}) \ln [1-\Phi (X_{i}\beta \exp(-Z_{i}\gamma))] \}

to:

\ln L (\beta, \gamma | X_{i}, Z_{i}) = \sum^{N}_{i=1} \{ \ln \Phi [q_{i}(X_{i}\beta \exp(-Z_{i}\gamma))]\}

 where q_{i}=2y_{i}-1 uses the fact that the PDF of the normal distribution is symmetric to cut out some algebra (and possibly lead to an easier log-likelihood function to maximize). The log-liklihood function in the above may prove difficult to maximize, although there is an LM test for homoskedasticity, that only requires the restricted (standard probit model) to be estimated. An interesting application would be to see if Bayesian methods estimated using a Gibbs sampling (via STAN, Bugs, or JAGS) can help deal with computationally difficult log likelihoods in this regard. A topic, I will hopefully turn to in a future blog post.

# hetprob
rm(list=ls()) # clear ws
library(maxLik) # use max lik package

# dgp
n <- 1000 # no. obs
x1 <- runif(n,-1,1) # predictor 1
x2 <- runif(n,-1,1) # " 2
e1 <- rnorm(n,0,1) # normal error
e2 <- (1 + 0.45*(x1+x2))*e1 # hetero error
y <- ifelse(0.5 + 0.5*x1 -0.5*x2 - e2 >0, 1, 0) # outcome

# estimate normal probit
r1 <- glm(y~x1+x2,family=binomial(link="probit")) 

# hetprob llik
hll <- function(beta){
  beta1 <- beta[1:dim(X)[2]]
  beta2 <- beta[(1+dim(X)[2]):((dim(X)[2]+dim(X)[2])-1)]
  q <- 2*y-1 # sym tranform
  xbeta <- pnorm(q*((X %*% beta1)/exp(X[,-1] %*% beta2)))
  sum(log(xbeta))      
}

X <- cbind(1,x1,x2)
ml1 <- maxLik(hll,start=c(0,0,0,0,0)) # maximize
sqrt(diag(-solve(ml1$hessian))) # get standard errors

# LR test of homosked
# 2*(LR_{unrestriced}-LR_{restriced})
LRtest <- 2*(ml1$maximum+1/2*r1$deviance)
# LS is chi^2 distributed with 2 dof
1-pchisq(LRtest,df=2)
# REJECT (as expected)

Standard, Robust, and Clustered Standard Errors Computed in R

Where do these come from? Since most statistical packages calculate these estimates automatically, it is not unreasonable to think that many researchers using applied econometrics are unfamiliar with the exact details of their computation.

For the purposes of illustration, I am going to estimate different standard errors from a basic linear regression model: \textbf{y}=\textbf{X} \mathbf{\beta}+\textbf{u}, using the fertil2 dataset used in Christopher Baum’s book. Let’s load these data, and estimate a linear regression with the lm function (which estimates the parameters \hat{\mathbf{\beta}} using the all too familiar: ( \textbf{X}'\textbf{X})^{-1}\textbf{X}'\textbf{y} least squares estimator.

rm(list=ls())
library(foreign)
#load data
children <- read.dta("children.dta")
# lm formula and data
form <- ceb ~ age + agefbrth + usemeth
data <- children
# run regression
r1 <- lm(form, data)
# get stand errs
> summary(r1)

Call:
lm(formula = form, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.8900 -0.7213 -0.0017  0.6950  6.2657 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.358134   0.173783   7.815 7.39e-15 ***
age          0.223737   0.003448  64.888  < 2e-16 ***
agefbrth    -0.260663   0.008795 -29.637  < 2e-16 ***
usemeth      0.187370   0.055430   3.380 0.000733 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.463 on 3209 degrees of freedom
  (1148 observations deleted due to missingness)
Multiple R-squared: 0.5726,	Adjusted R-squared: 0.5722 
F-statistic:  1433 on 3 and 3209 DF,  p-value: < 2.2e-16 

When the error terms are assumed homoskedastic IID, the calculation of standard errors comes from taking the square root of the diagonal elements of the variance-covariance matrix which is formulated:

E[\textbf{uu}'|\textbf{X}] = \mathbf{\Sigma_{u}}

\mathbf{\Sigma_{u}} = \sigma^2 I_{N}

\textrm{Var}[\hat{\mathbf{\beta}}|\textbf{X}] = (\textbf{X}'\textbf{X})^{-1} (\textbf{X}' \mathbf{\Sigma_{u}} \textbf{X}) (\textbf{X}'\textbf{X})^{-1}

\textrm{Var}[\hat{\mathbf{\beta}}|\textbf{X}] = \sigma_{u}^{2}(\textbf{X}'\textbf{X})^{-1}

In practice, and in R, this is easy to do. Estimate the variance by taking the average of the ‘squared’ residuals \textbf{uu}', with the appropriate degrees of freedom adjustment. Code is below. As you can see, these standard errors correspond exactly to those reported using the lm function.

# get X matrix/predictors
X <- model.matrix(r1)
# number of obs
n <- dim(X)[1]
# n of predictors
k <- dim(X)[2]
# calculate stan errs as in the above
# sq root of diag elements in vcov
se <- sqrt(diag(solve(crossprod(X)) * as.numeric(crossprod(resid(r1))/(n-k))))
> se
(Intercept)         age    agefbrth     usemeth 
0.173782844 0.003448024 0.008795350 0.055429804 

In the presence of heteroskedasticity, the errors are not IID. Consequentially, it is inappropriate to use the average squared residuals. The robust approach, as advocated by White (1980) (and others too), captures heteroskedasticity by assuming that the variance of the residual, while non-constant, can be estimated as a diagonal matrix of each squared residual. In other words, the diagonal terms in \mathbf{\Sigma_{u}} will, for the most part, be different , so the j-th row-column element will be \hat{u}_{j}^{2}. Once again, in R this is trivially implemented.

# residual vector
u <- matrix(resid(r1))
# meat part Sigma is a diagonal with u^2 as elements
meat1 <- t(X) %*% diag(diag(crossprod(t(u)))) %*% X
# degrees of freedom adjust
dfc <- n/(n-k)    
# like before
se <- sqrt(dfc*diag(solve(crossprod(X)) %*% meat1 %*% solve(crossprod(X))))
> se
(Intercept)         age    agefbrth     usemeth 
0.167562394 0.004661912 0.009561617 0.060644558 

Adjusting standard errors for clustering can be important. For example, replicating a dataset 100 times should not increase the precision of parameter estimates. However, performing this procedure with the IID assumption will actually do this. Another example is in economics of education research, it is reasonable to expect that the error terms for children in the same class are not independent.

Clustering standard errors can correct for this. Assume m clusters. Like in the robust case, it is \textbf{X}' \mathbf{\Sigma_{u}} \textbf{X} or ‘meat’ part, that needs to be adjusted for clustering. In practice, this involves multiplying the residuals by the predictors for each cluster separately, and obtaining \tilde{\textbf{u}}_{j} = \sum^{N_{k}}_{i=1} \hat{u}_{i}\textbf{x}_{i}, an m by k matrix (where k is the number of predictors). ‘Squaring’ \tilde{\textbf{u}}_{j} results in a k by k matrix (the meat part). To get the standard errors, one performs the same steps as before, after adjusting the degrees of freedom for clusters.

# cluster name
cluster <- "children"
# matrix for loops
clus <- cbind(X,data[,cluster],resid(r1))
colnames(clus)[(dim(clus)[2]-1):dim(clus)[2]] <- c(cluster,"resid")
# number of clusters
m <- dim(table(clus[,cluster]))
# dof adjustment
dfc <- (m/(m-1))*((n-1)/(n-k))
# uj matrix
uclust <- matrix(NA, nrow = m, ncol = k)
gs <- names(table(data[,cluster]))
for(i in 1:m){
   uclust[i,] <- t(matrix(clus[clus[,cluster]==gs[i],k+2])) %*% clus[clus[,cluster]==gs[i],1:k] 
   }
# square root of diagonal on bread meat bread like before
se <- sqrt(diag(solve(crossprod(X)) %*% (t(uclust) %*% uclust) %*% solve(crossprod(X)))*dfc
> se
(Intercept)         age    agefbrth     usemeth 
 0.42485889  0.03150865  0.03542962  0.09435531 

For calculating robust standard errors in R, both with more goodies and in (probably) a more efficient way, look at the sandwich package. The same applies to clustering and this paper. However, here is a simple function called ols which carries out all of the calculations discussed in the above.

ols <- function(form, data, robust=FALSE, cluster=NULL,digits=3){
  r1 <- lm(form, data)
  if(length(cluster)!=0){
    data <- na.omit(data[,c(colnames(r1$model),cluster)])
    r1 <- lm(form, data)
  }
  X <- model.matrix(r1)
  n <- dim(X)[1]
  k <- dim(X)[2]
  if(robust==FALSE & length(cluster)==0){
    se <- sqrt(diag(solve(crossprod(X)) * as.numeric(crossprod(resid(r1))/(n-k))))
    res <- cbind(coef(r1),se)
  }
  if(robust==TRUE){
    u <- matrix(resid(r1))
    meat1 <- t(X) %*% diag(diag(crossprod(t(u)))) %*% X
    dfc <- n/(n-k)    
    se <- sqrt(dfc*diag(solve(crossprod(X)) %*% meat1 %*% solve(crossprod(X))))
    res <- cbind(coef(r1),se)
    }
  if(length(cluster)!=0){
    clus <- cbind(X,data[,cluster],resid(r1))
    colnames(clus)[(dim(clus)[2]-1):dim(clus)[2]] <- c(cluster,"resid")
    m <- dim(table(clus[,cluster]))
    dfc <- (m/(m-1))*((n-1)/(n-k))
    uclust  <- apply(resid(r1)*X,2, function(x) tapply(x, clus[,cluster], sum))
    se <- sqrt(diag(solve(crossprod(X)) %*% (t(uclust) %*% uclust) %*% solve(crossprod(X)))*dfc)   
    res <- cbind(coef(r1),se)
  }
  res <- cbind(res,res[,1]/res[,2],(1-pnorm(abs(res[,1]/res[,2])))*2)
  res1 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res)),nrow=dim(res)[1])
  rownames(res1) <- rownames(res)
  colnames(res1) <- c("Estimate","Std. Error","t value","Pr(>|t|)")
  return(res1)
}

# with data as before
> ols(ceb ~ age + agefbrth + usemeth,children)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    1.358      0.174   7.815    0.000
age            0.224      0.003  64.888    0.000
agefbrth      -0.261      0.009 -29.637    2.000
usemeth        0.187      0.055   3.380    0.001
> ols(ceb ~ age + agefbrth + usemeth,children,robust=T)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    1.358      0.168   8.105    0.000
age            0.224      0.005  47.993    0.000
agefbrth      -0.261      0.010 -27.261    2.000
usemeth        0.187      0.061   3.090    0.002
> ols(ceb ~ age + agefbrth + usemeth,children,cluster="children")
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    1.358      0.425   3.197    0.001
age            0.224      0.032   7.101    0.000
agefbrth      -0.261      0.035  -7.357    2.000
usemeth        0.187      0.094   1.986    0.047

Instrumental Variables without Traditional Instruments

Typically, regression models in empirical economic research suffer from at least one form of endogeneity bias.

The classic example is economic returns to schooling, where researchers want to know how much increased levels of education affect income. Estimation using a simple linear model, regressing income on schooling, alongside a bunch of control variables, will typically not yield education’s true effect on income. The problem here is one of omitted variables – notably unobserved ability. People who are more educated may be more motivated or have other unobserved characteristics which simultaneously affect schooling and future lifetime earnings.

Endogeneity bias plagues empirical research. However, there are solutions, the most common being instrumental variables (IVs). Unfortunately, the exclusion restrictions needed to justify the use of traditional IV methodology may be impossible to find.

So, what if you have an interesting research question, some data, but endogeneity with no IVs. You should give up, right? Wrong. According to Lewbel (forthcoming in Journal of Business and Economic Statistics), it is possible to overcome the endogeneity problem without the use of a traditional IV approach.

Lewbel’s paper demonstrates how higher order moment restrictions can be used to tackle endogeneity in triangular systems. Without going into too much detail (interested readers can consult Lewbel’s paper), this method is like the traditional two-stage instrumental variable approach, except the first-stage exclusion restriction is generated by the control, or exogenous, variables which we know are heteroskedastic (interested practitioners can test for this in the usual way, i.e. a White test).

In the code below, I demonstrate how one could employ this approach in R using the GMM framework outlined by Lewbel. My code only relates to a simple example with one endogenous variable and two exogenous variables. However, it would be easy to modify this code depending on the model.

rm(list=ls())
library(gmm)
# gmm function for 1 endog variable with 2 hetero exogenous variable
# outcome in the first column of 'datmat', endog variable in second
# constant and exog variables in the next three
# hetero exog in the last two (i.e no constant)
g1 <- function(theta, datmat) {
  #set up data
  y1 <- matrix(datmat[,1],ncol=1)
  y2 <- matrix(datmat[,2],ncol=1)
  x1 <- matrix(datmat[,3:5],ncol=3)
  z1 <- matrix(datmat[,4:5],ncol=2)
  # if the variable in the 4th col was not hetero
  # this could be modified so:
  # z1 <- matrix(datmat[,5],ncol=1)

  #set up moment conditions
  in1 <- (y1 -theta[1]*x1[,1]-theta[2]*x1[,2]-theta[3]*x1[,3])
  M <- NULL
  for(i in 1:dim(z1)[2]){
    M <- cbind(M,(z1[,i]-mean(z1[,i])))
  }
  in2 <- (y2 -theta[4]*x1[,1]-theta[5]*x1[,2]-theta[6]*x1[,3]-theta[7]*y1)
  for(i in 1:dim(x1)[2]){M <- cbind(M,in1*x1[,i])}
  for(i in 1:dim(x1)[2]){M <- cbind(M,in2*x1[,i])}
  for(i in 1:dim(z1)[2]){M <- cbind(M,in2*((z1[,i]-mean(z1[,i]))*in1))}
  return(M)
}
# so estimation is easy
# gmm(function(...), data matrix, initial values vector)
# e.g : gmm(g1, x =as.matrix(dat),c(1,1,1,1,1,1,1))

I also tested the performance of Lewbel’s GMM estimator in comparison a mis-specified OLS estimator. In the code below, I perform 500 simulations of a triangular system containing an omitted variable. For the GMM estimator, it is useful to have good initial starting values. In this simple example, I use the OLS coefficients. In more complicated settings, it is advisable to use the estimates from the 2SLS procedure outlined in Lewbel’s paper. The distributions of the coefficient estimates are shown in the plot below. The true value, indicated by the vertical line, is one. It is pretty evident that the Lewbel approach works very well. I think this method could be very useful in a number of research disciplines.

beta1 <- beta2 <- NULL
for(k in 1:500){
  #generate data (including intercept)
  x1 <- rnorm(1000,0,1)
  x2 <- rnorm(1000,0,1)
  u <- rnorm(1000,0,1)
  s1 <- rnorm(1000,0,1)
  s2 <- rnorm(1000,0,1)
  ov <- rnorm(1000,0,1)
  e1 <- u + exp(x1)*s1 + exp(x2)*s1
  e2 <- u + exp(-x1)*s2 + exp(-x2)*s2
  y1 <- 1 + x1 + x2 + ov + e2 
  y2 <- 1 + x1 + x2 + y1 + 2*ov + e1
  x3 <- rep(1,1000)
  dat <- cbind(y1,y2,x3,x1,x2)
  
  #record ols estimate
  beta1 <- c(beta1,coef(lm(y2~x1+x2+y1))[4])
  #init values for iv-gmm
  init <- c(coef(lm(y2~x1+x2+y1)),coef(lm(y1~x1+x2)))
  #record gmm estimate
  beta2 <- c(beta2,coef(gmm(g1, x =as.matrix(dat),init))[7])
}

library(sm)
d <- data.frame(rbind(cbind(beta1,"OLS"),cbind(beta2,"IV-GMM")))
d$beta1 <- as.numeric(as.character(d$beta1))
sm.density.compare(d$beta1, d$V2,xlab=("Endogenous Coefficient"))
 title("Lewbel and OLS Estimates")
 legend("topright", levels(d$V2),lty=c(1,2,3),col=c(2,3,4),bty="n")
 abline(v=1)