Specification testing is an important part of econometric practice. However, from what I can see, few researchers perform heteroskedasticity tests after estimating probit/logit models. This is not a trivial point. Heteroskedasticity in these models can represent a major violation of the probit/logit specification, both of which assume homoskedastic errors.
Thankfully, tests for heteroskedasticity in these models exist, and it is also possible to estimate modified binary choice models that are robust to heteroskedastic errors. In this blog post I present an example of how to estimate a heteroskedastic probit in R, and also test for heteroskedasticity.
The standard probit model assumes that the error distribution of the latent model has a unit variance. The heteroskedastic probit model relaxes this assumption, and allows the error variance to depend on some of the predictors in the regression model. Those interested in further details of this model, and the potential implications of this form of model misspecification, should consult these notes.
In the code below, I simulate some data, specify the log-likelihood function for the heteroskedastic probit model, estimate this model via maximum likelihood, and then perform a simple LR test of homoskedasticity. Note the log-likelihood function can be simplified from:
to:
where uses the fact that the PDF of the normal distribution is symmetric to cut out some algebra (and possibly lead to an easier log-likelihood function to maximize). The log-liklihood function in the above may prove difficult to maximize, although there is an LM test for homoskedasticity, that only requires the restricted (standard probit model) to be estimated. An interesting application would be to see if Bayesian methods estimated using a Gibbs sampling (via STAN, Bugs, or JAGS) can help deal with computationally difficult log likelihoods in this regard. A topic, I will hopefully turn to in a future blog post.
# hetprob
rm(list=ls()) # clear ws
library(maxLik) # use max lik package
# dgp
n <- 1000 # no. obs
x1 <- runif(n,-1,1) # predictor 1
x2 <- runif(n,-1,1) # " 2
e1 <- rnorm(n,0,1) # normal error
e2 <- (1 + 0.45*(x1+x2))*e1 # hetero error
y <- ifelse(0.5 + 0.5*x1 -0.5*x2 - e2 >0, 1, 0) # outcome
# estimate normal probit
r1 <- glm(y~x1+x2,family=binomial(link="probit"))
# hetprob llik
hll <- function(beta){
beta1 <- beta[1:dim(X)[2]]
beta2 <- beta[(1+dim(X)[2]):((dim(X)[2]+dim(X)[2])-1)]
q <- 2*y-1 # sym tranform
xbeta <- pnorm(q*((X %*% beta1)/exp(X[,-1] %*% beta2)))
sum(log(xbeta))
}
X <- cbind(1,x1,x2)
ml1 <- maxLik(hll,start=c(0,0,0,0,0)) # maximize
sqrt(diag(-solve(ml1$hessian))) # get standard errors
# LR test of homosked
# 2*(LR_{unrestriced}-LR_{restriced})
LRtest <- 2*(ml1$maximum+1/2*r1$deviance)
# LS is chi^2 distributed with 2 dof
1-pchisq(LRtest,df=2)
# REJECT (as expected)
Hi, some time ago we started developing code for heteroskedastic GLMs on R-Forge: see the hetglm() function in the package “glmx”. This also includes the heteroskedastic probit model. We use analytical gradients and expected Hessians for the optimization. The code has been tested already; the main reason for not yet releasing it on CRAN is that we are not sure about a few aspects of the interface. But it should be fairly easy to use already. See also the examples.
Thanks.
From what I can see, the hetglm() function is (unsurprisingly) far superior than the crude method I have outlined. It runs much faster, and since it includes information about the gradients I would expect it to be far more accurate.
I am going to highlight this in a new blog post.
Great, thanks for the publicity in the new post!
Pingback: Heteroskedastic GLM in R « DiffusePrioR