Ordered logit as a cumulative log odds model

When survey respondents are asked about their general health, five response categories are usually used: Excellent, Very Good, Good, Fair, and Poor.

Even though this is an ordered outcome, we could transform it into a binary outcome. Indeed, we would have four choices as to how we wanted to transform it into a binary outcome:

Excellent = 1; Very Good, Good, Fair, or Poor = 0
Excellent or Very Good = 1; Good, Fair, or Poor = 0
Excellent, Very Good, or Good = 1; Fair or Poor = 0
Excellent, Very Good, Good, Fair = 1; or Poor = 0

We could even fit four separate binary logit models, one for each way of transforming our ordered outcome into a binary outcome.

\[ \begin{align} \ln\frac{\Pr(y=\textrm{excellent})}{\Pr(y<=\textrm{very good})} & = \mathbf{x}\mathbf{\beta_\mathrm{model1}} \\ \\ \ln\frac{\Pr(y>\textrm{good})}{\Pr(y<=\textrm{good})} & = \mathbf{x}\mathbf{\beta_\mathrm{model2}} \\ \\ \ln\frac{\Pr(y>\textrm{fair})}{\Pr(y<=\textrm{fair})} & = \mathbf{x}\mathbf{\beta_\mathrm{model3}} \\ \\ \ln\frac{\Pr(y>\textrm{poor})}{\Pr(y=\textrm{poor})} & = \mathbf{x}\mathbf{\beta_\mathrm{model4}} \\ \end{align} \]

If we did that, we would have four different sets of regression coefficients. This would be cumbersome, and could be needlessly so if a simpler model fits the data well.

The simplest version of this model would involve the logit coefficients being the same across all four logit models, so that the only difference across the different logits would be the intercept.

A model that constrains the four coefficients to be the same is the ordered logit model, also sometimes called the cumulative logit model.

Let’s think about our ordered outcome as coded in terms of consecutive integers, from worst to best in terms of health: Poor=1, Fair=2, Good=3, Very Good=4, and Excellent=5.

We could write our model as:

\[ \ln\frac{\Pr(y>m)}{\Pr(y<=m)} = -\tau_m + \mathbf{x}\mathbf{\beta} \]

The outcome here is the cumulative log odds, that is the log odds of the outcome being greater than a given value, as opposed to the log odds of it being equal to that value.

As written above, our vector of \(\beta\) coefficients does not include a \(\beta_0\) for the intercept term. Instead, each of the \(k-1\) contrasts between \(y>m\) and \(y<=m\) has a separate intercept term. But instead of using \(\beta_0m\) to denote this separate term, we are using \(\tau_m\) to keep notation consistent with the latent variable approach. For this to work out, we have to flip the sign of \(\tau_m\) when we write the model, which is why the formula is written as \(-\tau_m\).

Additional detail: \(\beta_0\) vs. \(\tau\). The way we wrote the binary logit model was that we constrained \(\tau\) to 0 and we estimated \(\beta_0\). Instead, we could have constrained \(\beta_0\) to 0 and estimated \(\tau\), and if so the estimate of \(\tau\) would have been \(-1 \times\) the estimate of \(\beta_0\) when \(\tau\) is constrained to 0.

In other words, the difference \(\beta_0\) - \(\tau\) remains the same, and what is arbitrary is whether we estimate that difference by constraining \(\beta_0\) to 0 or \(\tau\) to 0. In the ordered logit model, the same principle holds, only now we are writing it without constraining \(\tau_m\) to 0 but instead are effectively constraining \(\beta_0\) to 0 by fitting the model with terms for \(\tau_m\) but no intercept.

Predicted probabilities

Calculating predicted probabilities uses the same logic as earlier with the latent variable approach. As then, we can simplify the math by stipulating that there are two additional values of \(\tau\):

\(\tau_0\) at \(-\infty\) so that \(\Pr(y > 0) = 1\)
\(\tau_k\) at \(\infty\) so that \(\Pr(y > k) = 0\).

The probability that \(y > m\) is then:

\[ Pr(y>m) = \frac{\exp(\mathbf{x}\mathbf{\beta} - \tau_m)}{\exp(xb - \tau_m) + 1} \]

Once we can compute the probability that \(y > m\), the probability that \(y = m\) is then just a matter of subtraction:

\[ \begin{align} Pr(y = m) & = Pr(y > m-1) - Pr(y > m) \\ & = \frac{\exp(\mathbf{x}\mathbf{\beta} - \tau_{m-1})}{\exp(\mathbf{x}\mathbf{\beta} - \tau_{m-1}) + 1} - \frac{\exp(\mathbf{x}\mathbf{\beta} - \tau_{m})}{\exp(\mathbf{x}\mathbf{\beta} - \tau_{m}) + 1} \end{align} \]

For example, the probability that \(y = 2\) is the probability that \(y > 2\) minus the probability that \(y > 1\).

Example

The Wisconsin Longitudinal Study (WLS) is based on a 1/3 sample of Wisconsin 1957 high school graduates, and these respondents have been surveyed periodically since. In 2004, when the respondents were ~65 years old, they were surveyed, and the survey included self-reported health.

Here we are going to estimate relationships between adminstrative measures from when respondents were in high school on their health at age ~65. We have four measures:

Sex, measured as male/female binary.
Parent socioeconomic status from information about occupation, income, and tax roll information as available (\(\mathtt{z\_ses57}\), measured in standard deviations)
Grades in school (\(\mathtt{z\_classrank}\), standard deviations based on a transformation of class rank)
Score on a high school aptitude test (a so-called IQ test, specifically the “Henmon-Nelson Test of Mental Ability”, \(\mathtt{hn1}\), measured in standard deviations).

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(haven)
library(MASS)


Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select

library(modelsummary)

# Read data and prepare for analysis
df <- read_dta("../dta/wlshealth.dta") %>%
  filter(!is.na(health04)) %>%
  mutate(health04 = as_factor(health04)) %>%
  mutate(female = as_factor(female))

# Fit ordered logit model
model <- polr(health04 ~ female + z_ses57 + z_classrank + hn1,
              data = df, Hess = TRUE, method = "logistic")

# Display model results
modelsummary(model, gof_map = c("nobs"))

	(1)
poor\|fair	-3.860
	(0.083)
fair\|good	-2.278
	(0.046)
good\|very good	-0.540
	(0.034)
very good\|excellent	1.159
	(0.037)
femalefemale	-0.109
	(0.045)
z_ses57	0.177
	(0.023)
z_classrank	0.271
	(0.029)
hn1	0.053
	(0.029)
Num.Obs.	7221


. ologit health04 i.female z_ses57 z_classrank hn1

Iteration 0:   log likelihood = -9756.6779
Iteration 1:   log likelihood = -9605.0918
Iteration 2:   log likelihood = -9604.6428
Iteration 3:   log likelihood = -9604.6427

Ordered logistic regression                             Number of obs =  7,221
                                                        LR chi2(4)    = 304.07
                                                        Prob > chi2   = 0.0000
Log likelihood = -9604.6427                             Pseudo R2     = 0.0156

------------------------------------------------------------------------------
    health04 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |
     female  |  -.1092917   .0448539    -2.44   0.015    -.1972037   -.0213796
     z_ses57 |   .1772502   .0226307     7.83   0.000     .1328948    .2216057
 z_classrank |   .2708136   .0291472     9.29   0.000     .2136862    .3279411
         hn1 |   .0526799   .0288361     1.83   0.068    -.0038377    .1091976
-------------+----------------------------------------------------------------
       /cut1 |  -3.860039   .0828935                     -4.022507   -3.697571
       /cut2 |  -2.277788   .0463692                      -2.36867   -2.186906
       /cut3 |  -.5395163   .0344048                     -.6069484   -.4720842
       /cut4 |   1.159183   .0367412                      1.087172    1.231195
------------------------------------------------------------------------------

Because the means of \(\mathtt{z\_ses57}\), \(\mathtt{z\_classrank}\), and \(\mathtt{hn1}\) are all 0, \(\mathbf{x}\mathbf{\beta}\) for a woman who is average on these characteristics is -.11, and \(\mathbf{x}\mathbf{\beta}\) for a man who is average on these characteristics is 0.

The probability of a man who is average on these other characteristics reported that they are either in excellent or very good health (\(\mathtt{health04==4}\) or \(\mathtt{health04==5}\)) is:

\[ \begin{align} \Pr(y > 3) & = \frac{\exp(0+.54)}{\exp(0+.54)+1} \\ \\ & = \frac{\exp(.54)}{\exp(.54)+1} \\ \\ & = \frac{1.716}{2.716} \\ \\ & = .632 \end{align} \]

The probability of that same man being in very good health is:

\[ \begin{align} \Pr(y = 4) & = \Pr(y > 3) - \Pr(y > 4) \\ \\ & = \frac{\exp(0+.54)}{\exp(0+.54)+1} - \frac{\exp(0-1.16)}{\exp(0-1.16)+1} \\ \\ & = \frac{\exp(.54)}{\exp(.54)+1} - \frac{\exp(-1.16)}{\exp(-1.16)+1} \\ \\ & = \frac{1.716}{2.716} - \frac{.313}{1.313} \\ \\ & = .632 - .238 \\ \\ & = .394 \end{align} \]

In Stata, we can compute this predicted probability (and check our calculation above) using \(\mathtt{margins}\). To have \(\mathtt{margins}\) return the predicted probability of a specific outcome, we use the option \(\mathtt{predict(outcome(}\#\mathtt{))}\), where \(\#\) is the \(\#\)th lowest outcome.

library(marginaleffects)

# Create new data for prediction: male (female="male") with all other vars at 0
newdata <- data.frame(
  female = factor("male", levels = levels(df$female)),
  z_ses57 = 0,
  z_classrank = 0,
  hn1 = 0
)

# Get predicted probabilities for all outcome categories
predictions <- predict(model, newdata = newdata, type = "probs")

# Display probability of 4th outcome (Very Good)
predictions["very good"]

very good 
0.3928852

. margins, at(female = 0 hn = 0 z_ses57 = 0 z_classrank = 0) predict(outcome(4))

Adjusted predictions                                     Number of obs = 7,221
Model VCE: OIM

Expression: Pr(health04==4), predict(outcome(4))
At: female      = 0
    z_ses57     = 0
    z_classrank = 0
    hn1         = 0

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   .3928842   .0060207    65.26   0.000     .3810838    .4046846
------------------------------------------------------------------------------

Stata: Warning about \(\mathtt{outcome()}\). For ordered models, you can get the predicted probability of a specific outcome using the option \(\mathtt{outcome(\#)}\). But, as noted above, \(\#\) is the \(\#\)th lowest outcome. If your outcome is coded as consecutive integers starting with 1, then the #th lowest outcome will be the value of the outcome. If you don’t have it coded that way, \(\#\) may not be the value of the outcome you are interested in. If you have your variable coded as \(0 1 2 3 4\), for example, \(\mathtt{outcome(3)}\) would give you \(\Pr(y=2)\), not \(\Pr(y=3)\).

For this reason, I recommend always coding your ordered outcomes in Stata as consecutive integers starting with 1.