Estimating a least squares linear regression model with fixed effects is a common task in applied econometrics, especially with panel data. For example, one might have a panel of countries and want to control for fixed country factors. In this case the researcher will effectively include this fixed identifier as a factor variable, and then proceed to estimate the model that includes as many dummy variables (minus one if an intercept is included in the modelling equation) as there are countries. Obviously, this approach is computationally problematic when there are many fixed factors. In our simple example, an extra country will add an extra column to the matrix used in the least squares calculation.

Fortunately, there are a number of data transformations that can be used in this panel setting. These include demeaning each within unit observation, using first differences, or including the group means as additional explanatory variables (as suggested by (Mundlak 1978)). However, these approaches only work well when there is one factor that the researcher wants to include fixed effects to account for.

Simen Gaure offers a solution this problem that allows for multiple fixed effects without resorting to a computationally burdensome methodology. Essentially the solution involves an elaboration of the group demeaning transformation mentioned in the above. More technical details can be found here or by referring to Gaure’s forthcoming article in Computational Statistics & Data Analysis. Those interested in implementing this estimation strategy in R can use the lfe package available on CRAN.

In the below, I have included a simple example of how the package works. In this example, the model needs to be set up to calculate fixed effects for two factor variables. Obviously, adding 2,000 columns to the data frame is not a convenient way to estimate the model that includes fixed effects for both the x2 and x3 variables. However, the felm function tackles this problem with ease. Stata has a similar function to feml, areg, although the areg function only allows for absorbed fixed effects in one variable.

# clear workspace rm(list=ls()) # load lfe package library(lfe) # create data frame x1 <- rnorm(10000) x2 <- rep(1:1000,10) x3 <- rep(1:1000,10) e1 <- sin(x2) + 0.02*x3^2 + rnorm(10000) y <- 10 + 2.5*x1 + (e1-mean(e1)) dat <- data.frame(x1,x2,x3,y) # simple lm lm(y~x1) # lm with fixed effects felm(dat$y ~ dat$x1 + G(dat$x2) + G(dat$x3)) ############################################## # output ############################################## # simple lm > lm(y~x1) Call: lm(formula = y ~ x1) Coefficients: (Intercept) x1 10.47 -36.95 > # lm with fixed effects > felm(dat$y ~ dat$x1 + G(dat$x2) + G(dat$x3)) dat$x1 2.501

There is a published article describing the method behind lfe in CSDA (2013): http://www.sciencedirect.com/science/article/pii/S0167947313001266

Thanks Simen. I’ve updated the post accordingly.