When survey respondents are asked about their general health, five response categories are usually used: Excellent, Very Good, Good, Fair, and Poor.
Even though this is an ordered outcome, we could transform it into a binary outcome. Indeed, we would have four choices as to how we wanted to transform it into a binary outcome:
Excellent = 1; Very Good, Good, Fair, or Poor = 0
Excellent or Very Good = 1; Good, Fair, or Poor = 0
Excellent, Very Good, or Good = 1; Fair or Poor = 0
Excellent, Very Good, Good, Fair = 1; or Poor = 0
We could even fit four separate binary logit models, one for each way of transforming our ordered outcome into a binary outcome.
If we did that, we would have four different sets of regression coefficients. This would be cumbersome, and could be needlessly so if a simpler model fits the data well.
The simplest version of this model would involve the logit coefficients being the same across all four logit models, so that the only difference across the different logits would be the intercept.
A model that constrains the four coefficients to be the same is the ordered logit model, also sometimes called the cumulative logit model.
Let’s think about our ordered outcome as coded in terms of consecutive integers, from worst to best in terms of health: Poor=1, Fair=2, Good=3, Very Good=4, and Excellent=5.
The outcome here is the cumulative log odds, that is the log odds of the outcome being greater than a given value, as opposed to the log odds of it being equal to that value.
As written above, our vector of \(\beta\) coefficients does not include a \(\beta_0\) for the intercept term. Instead, each of the \(k-1\) contrasts between \(y>m\) and \(y<=m\) has a separate intercept term. But instead of using \(\beta_0m\) to denote this separate term, we are using \(\tau_m\) to keep notation consistent with the latent variable approach. For this to work out, we have to flip the sign of \(\tau_m\) when we write the model, which is why the formula is written as \(-\tau_m\).
Additional detail: \(\beta_0\) vs. \(\tau\). The way we wrote the binary logit model was that we constrained \(\tau\) to 0 and we estimated \(\beta_0\). Instead, we could have constrained \(\beta_0\) to 0 and estimated \(\tau\), and if so the estimate of \(\tau\) would have been \(-1 \times\) the estimate of \(\beta_0\) when \(\tau\) is constrained to 0.
In other words, the difference \(\beta_0\) - \(\tau\) remains the same, and what is arbitrary is whether we estimate that difference by constraining \(\beta_0\) to 0 or \(\tau\) to 0. In the ordered logit model, the same principle holds, only now we are writing it without constraining \(\tau_m\) to 0 but instead are effectively constraining \(\beta_0\) to 0 by fitting the model with terms for \(\tau_m\) but no intercept.
Predicted probabilities
Calculating predicted probabilities uses the same logic as earlier with the latent variable approach. As then, we can simplify the math by stipulating that there are two additional values of \(\tau\):
\(\tau_0\) at \(-\infty\) so that \(\Pr(y > 0) = 1\)
\(\tau_k\) at \(\infty\) so that \(\Pr(y > k) = 0\).
For example, the probability that \(y = 2\) is the probability that \(y > 2\) minus the probability that \(y > 1\).
Example
The Wisconsin Longitudinal Study (WLS) is based on a 1/3 sample of Wisconsin 1957 high school graduates, and these respondents have been surveyed periodically since. In 2004, when the respondents were ~65 years old, they were surveyed, and the survey included self-reported health.
Here we are going to estimate relationships between adminstrative measures from when respondents were in high school on their health at age ~65. We have four measures:
Sex, measured as male/female binary.
Parent socioeconomic status from information about occupation, income, and tax roll information as available (\(\mathtt{z\_ses57}\), measured in standard deviations)
Grades in school (\(\mathtt{z\_classrank}\), standard deviations based on a transformation of class rank)
Score on a high school aptitude test (a so-called IQ test, specifically the “Henmon-Nelson Test of Mental Ability”, \(\mathtt{hn1}\), measured in standard deviations).
Because the means of \(\mathtt{z\_ses57}\), \(\mathtt{z\_classrank}\), and \(\mathtt{hn1}\) are all 0, \(\mathbf{x}\mathbf{\beta}\) for a woman who is average on these characteristics is -.11, and \(\mathbf{x}\mathbf{\beta}\) for a man who is average on these characteristics is 0.
The probability of a man who is average on these other characteristics reported that they are either in excellent or very good health (\(\mathtt{health04==4}\) or \(\mathtt{health04==5}\)) is:
In Stata, we can compute this predicted probability (and check our calculation above) using \(\mathtt{margins}\). To have \(\mathtt{margins}\) return the predicted probability of a specific outcome, we use the option \(\mathtt{predict(outcome(}\#\mathtt{))}\), where \(\#\) is the \(\#\)th lowest outcome.
library(marginaleffects)# Create new data for prediction: male (female="male") with all other vars at 0newdata <-data.frame(female =factor("male", levels =levels(df$female)),z_ses57 =0,z_classrank =0,hn1 =0)# Get predicted probabilities for all outcome categoriespredictions <-predict(model, newdata = newdata, type ="probs")# Display probability of 4th outcome (Very Good)predictions["very good"]
Stata: Warning about \(\mathtt{outcome()}\). For ordered models, you can get the predicted probability of a specific outcome using the option \(\mathtt{outcome(\#)}\). But, as noted above, \(\#\) is the \(\#\)th lowest outcome. If your outcome is coded as consecutive integers starting with 1, then the #th lowest outcome will be the value of the outcome. If you don’t have it coded that way, \(\#\) may not be the value of the outcome you are interested in. If you have your variable coded as \(0 1 2 3 4\), for example, \(\mathtt{outcome(3)}\) would give you \(\Pr(y=2)\), not \(\Pr(y=3)\).
For this reason, I recommend always coding your ordered outcomes in Stata as consecutive integers starting with 1.