Negative binomial regression

In real data, we usually find that the variance in count outcomes is greater than expected under the Poisson distribution, and after including regressors in a Poisson regression the variance is still greater than we would expect if the outcome was conditionally Poisson on those regressors. This greater variance is often referred to as overdispersion.

This can happen for two reasons:

Unobserved heterogeneity. Even though we adjusted for some covariates, there are still others that cause \(\mu\) to differ over observations. It could even be the case that \(\mu\) differs among observations for idiosyncratic reasons.
Endogenous contagion. The occurrence of event itself makes subsequent occurrences of the event more likely. In an example in which the outcome is number of publications, for example, we might imagine that successfully publishing an article makes it easier to publish more articles.

The negative binomial regression model is an alternative to the Poisson regression model that allows for overdispersion. The negative binomial model has an extra parameter, \(\alpha\), that fits the overdispersion.

Negative binomial regression model

In Poisson regression, the logged count was specified as:

\[ \ln(\mu_i) = \mathbf{x}_i\mathbf{\beta} \]

In negative binomial regression, we add an individual-specific error term to this:

\[ \ln(\mu_i) = \mathbf{x}_i\mathbf{\beta} + \varepsilon_i \]

This implies that:

\[ \mu_i = \exp(\mathbf{x}_i\mathbf{\beta})\exp(\varepsilon_i) \]

where \(\delta_i\) is sometimes used to refer to \(\exp(\varepsilon_i)\). We assume that the expected value of \(\varepsilon_i\) is 0 and so the expected value of \(\delta\) is 1.

Predicted probability

The formula for the predicted probability in Poisson regression is:

\[ \Pr(y=k|\mathbf{x}) = \frac{\bigl[\exp(\mathbf{x}\mathbf{\beta})\bigr]^k \bigl[\exp\left(-\exp(\mathbf{x}\mathbf{\beta})\right)\bigr]}{k!} \]

In negative binomial regression, the formula looks similar:

\[ \Pr(y=k|\mathbf{x}) = \frac{\bigl[\exp(\mathbf{x}\mathbf{\beta})\delta_i\bigr]^k \bigl[\exp\left(-\exp(\mathbf{x}\mathbf{\beta})\delta_i\right)\bigr]}{k!} \]

But the \(\delta_i\) in the above really complicates things because it is unobserved. The math gets elaborate in a way that is not helpful for purposes here, and involves the assumption that \(\delta_i\) is drawn from a gamma distribution.

\[ \Pr\left( y=k\mid\mathbf{x}\right) =% \frac{\Gamma(y+\alpha^{-1}) }{y!\Gamma(\alpha^{-1}) }\left( \frac{\alpha^{-1}% }{\alpha ^{-1}+\mu}\right) ^{\alpha^{-1}}\!\!\left( \frac{\mu}{% \alpha^{-1}+\mu }\right) ^{y}\; \]

The result is that the predicted probabilities of counts k for a given value of x have more variance than they do under the Poisson distribution. The amount of the extra dispersion is determined by the \(\alpha\) in the above formula.

\(\alpha\) is a parameter that is evaluated along with the coefficients when the negative binomial regression model is fit using maximum likelihood.

If \(\alpha\) is zero, then there is no overdispersion, and the negative binomial model is the same as Poisson.

Stata: The \(\ln(alpha)\) parameter. When the negative binomial regression model is fit, Stata estimates \(\ln(\alpha)\), which means that the estimate of \(\alpha\) itself is constrained to be positive.