# Minimizing Bias in Observational Studies

Measuring the effect of a binary treatment on a measured outcome is one of the most common tasks in applied statistics. Examples of these applications abound, like the effect of smoking on health, or the effect of low birth weight on cognitive development. In an ideal world we would like to be able to assign one group of people to receive some form of treatment, and an identical group to not receive this treatment. In this world, the average treatment effect (ATE) is the difference in outcomes between the treatment and control groups.

However, the real world is far from ideal, and the problematic nature of measuring causal effects has (justifiably) spawned a wide literature. Many solutions have been proposed to this problem. The simplest one involves controlling for the pre-treatment characteristics defining both the treatment and control groups. For example, in the case of low birth weight, a researcher may want to adjust for variation in the parent’s socioeconomic status, as one would expect that the ‘treatment’ of low birth weight not to be randomly assigned amongst different socioeconomic strata. Once controls for various confounding variables are introduced, it is then feasible to measure the causal effect of the treatment. Methods that perform this adjustment include simple linear regression modelling, or various forms of matching estimators.

The key assumption in the above is that the researcher is able to separate out differences between the treatment and control groups based on observable characteristics (selection on observables). However, there are many cases, especially in the social sciences, where it is not unreasonable to suspect this assumption does not hold (selection based on the unobservable). In this scenario, the use of instrumental variable estimators represents a viable solution.

Unfortunately, suitable instrumental variables are commonly not available to researchers. In this instance is there anything a researcher can do? Yes, according to a recently published paper by Daniel L. Millimet and Rusty Tchernis. Millimet and Tchernis examine this problem from the point of view of a researcher trying to minimize the bias associated with selection on unobservables. Their paper demonstrates how the bias of ATE can be derived from a regression model of the probability of treatment on observable characteristics. Using this regression model, it is possible to find a bias minimizing propensity score (P*). Once this score is calculated, the researcher is able to estimate a bias reduced ATE by trimming observations outside a pre-specified neighborhood around P*.

Millimet and Tchernis propose a number of estimators that one can use to produce bias minimized ATEs. Those interested in potentially using this bias minimization strategy should refer to their paper for a more detailed examination of these estimators, and their potential uses/misuses. This paper also features a neat empirical application that suggests why their approach might be better than the more conventional methods.

In the below, I have supplied an image summarizing the output produced from a Monte Carlo exercise to highlight the efficacy of the bias corrected approach. Those interested in design of this experiment should look at the Millimet and Tchernis paper, since I have simply replicated their MC design (250 datasets with 5,000 observations in each), with the assumption that the correlation between the treatment equation error term and outcome equation is −0.6 (so a really strong correlation). Additionally, I have set the trimming parameter (theta) to 0.05, so at least 5% of the treatment and 5% of the control group are contained in the trimmed sample.

Note that full descriptions of the acronyms in this image can be found in the paper, although MB.BC and IPW.BC refer to the bias corrected measures of the ATE, and from the image it is clear to see that these estimators are much closer to the true average treatment effect of 1, albeit with a higher variance. This simulation was conducted with R using a self-written function. I have benchmarked this function against Daniel Millimet’s STATA function, and the results are identical. I hope to possibly release this function as an R package in the future, although I would be happy to supply my function’s code to anyone who is interested.

# Heteroskedastic GLM in R

A commenter on my previous blog entry has drawn my attention to an R function called hetglm() that estimates heteroskedastic probit models. This function is contained in the glmx package. The glmx package is not available on CRAN yet, but thankfully can be downloaded here.

The hetglm() function has a number of computational advantages compared with the crude method outlined in my previous post. The following example replicates the previous analysis showing the speed advantage associated with using the hetglm() function.

rm(list=ls()) # clear ws
library(maxLik)
library(glmx)
n <- 1000 # no. obs
x1 <- runif(n,-1,1) # predictor 1
x2 <- runif(n,-1,1) # " 2
e1 <- rnorm(n,0,1) # normal error
e2 <- (1 + 0.45*(x1+x2))*e1 # hetero error
y <- ifelse(0.5 + 0.5*x1 -0.5*x2 - e2 >0, 1, 0) # outcome
# estimate normal probit
system.time(ml1 <- maxLik(hll,start=c(0,0,0,0,0))) # maximize
system.time(h1 <- hetglm(y ~ x1 + x2))

# output
> system.time(ml1 <- maxLik(hll,start=c(0,0,0,0,0))) # maximize
user  system elapsed
4.43    0.00    4.59
> system.time(h1 <- hetglm(y ~ x1 + x2))
user  system elapsed
0.11    0.00    0.11


# The Heteroskedastic Probit Model

Specification testing is an important part of econometric practice. However, from what I can see, few researchers perform heteroskedasticity tests after estimating probit/logit models. This is not a trivial point. Heteroskedasticity in these models can represent a major violation of the probit/logit specification, both of which assume homoskedastic errors.

Thankfully, tests for heteroskedasticity in these models exist, and it is also possible to estimate modified binary choice models that are robust to heteroskedastic errors. In this blog post I present an example of how to estimate a heteroskedastic probit in R, and also test for heteroskedasticity.

The standard probit model assumes that the error distribution of the latent model has a unit variance. The heteroskedastic probit model relaxes this assumption, and allows the error variance to depend on some of the predictors in the regression model. Those interested in further details of this model, and the potential implications of this form of model misspecification, should consult these notes.

In the code below, I simulate some data, specify the log-likelihood function for the heteroskedastic probit model, estimate this model via maximum likelihood, and then perform a simple LR test of homoskedasticity. Note the log-likelihood function can be simplified from:

$\ln L (\beta, \gamma | X_{i}, Z_{i}) = \sum^{N}_{i=1} \{ Y_{i} \ln \Phi [X_{i}\beta \exp(-Z_{i}\gamma)] + (1-Y_{i}) \ln [1-\Phi (X_{i}\beta \exp(-Z_{i}\gamma))] \}$

to:

$\ln L (\beta, \gamma | X_{i}, Z_{i}) = \sum^{N}_{i=1} \{ \ln \Phi [q_{i}(X_{i}\beta \exp(-Z_{i}\gamma))]\}$

where $q_{i}=2y_{i}-1$ uses the fact that the PDF of the normal distribution is symmetric to cut out some algebra (and possibly lead to an easier log-likelihood function to maximize). The log-liklihood function in the above may prove difficult to maximize, although there is an LM test for homoskedasticity, that only requires the restricted (standard probit model) to be estimated. An interesting application would be to see if Bayesian methods estimated using a Gibbs sampling (via STAN, Bugs, or JAGS) can help deal with computationally difficult log likelihoods in this regard. A topic, I will hopefully turn to in a future blog post.

# hetprob
rm(list=ls()) # clear ws
library(maxLik) # use max lik package

# dgp
n <- 1000 # no. obs
x1 <- runif(n,-1,1) # predictor 1
x2 <- runif(n,-1,1) # " 2
e1 <- rnorm(n,0,1) # normal error
e2 <- (1 + 0.45*(x1+x2))*e1 # hetero error
y <- ifelse(0.5 + 0.5*x1 -0.5*x2 - e2 >0, 1, 0) # outcome

# estimate normal probit

# hetprob llik
hll <- function(beta){
beta1 <- beta[1:dim(X)[2]]
beta2 <- beta[(1+dim(X)[2]):((dim(X)[2]+dim(X)[2])-1)]
q <- 2*y-1 # sym tranform
xbeta <- pnorm(q*((X %*% beta1)/exp(X[,-1] %*% beta2)))
sum(log(xbeta))
}

X <- cbind(1,x1,x2)
ml1 <- maxLik(hll,start=c(0,0,0,0,0)) # maximize
sqrt(diag(-solve(ml1$hessian))) # get standard errors # LR test of homosked # 2*(LR_{unrestriced}-LR_{restriced}) LRtest <- 2*(ml1$maximum+1/2*r1\$deviance)
# LS is chi^2 distributed with 2 dof
1-pchisq(LRtest,df=2)
# REJECT (as expected)