Probit and extending the latent variable approach to ordered outcomes

We wrote the probit model for binary outcomes as a latent variable model in which the outcome was unobserved variable \(y^\ast\).

\[ y^*_i = \mathbf{x_i}\mathbf{\beta} + \varepsilon_i \]

and the errors were distributed normally with a mean of 0 and standard deviation of 1.

The way that unobserved \(y^\ast\) mapped onto our observed binary variable was whether or not y* was greater than a threshold \(\tau\).

For ordered responses, we will follow the same principle. The key difference is that, instead of having one threshold \(\tau\), we will have \(k-1\) thresholds, where \(k\) is our number of categories.

As for how this maps onto the ordered variable we observe, imagine a survey item that has as its response categories: (1) strongly disagree (2) disagree (3) agree (4) strongly agree. We would have three thresholds: \(\tau_1\), \(\tau_2\), and \(\tau_3\). We would observe:

For purposes of estimation, all this needs to map onto the number line somehow.

Here is a visualization of the ordered probit model for a three-category outcome.

In this visualization, notice that:

Predicted probability of different outcomes

We will be able to fit our model using maximum likelihood so long as we have a means of computing the predicted probability of observing each of our outcome categories, for whatever value of \(\mathbf{x}\mathbf{\beta}\).

The math is easiest if we consider our \(k\) categories to be numbered as consecutive integers starting with 1.

We proceed again as in binary probit, by remembering that \(y^\ast = \mathbf{x}\mathbf{\beta} + \varepsilon\) and that \(\varepsilon\) is distributed as standard normal. Consequently, for a given value of \(\mathbf{x}\mathbf{\beta}\), the predicted probability will be based on what different random draws from \(\varepsilon\) result imply for \(\mathbf{x}\mathbf{\beta} + \varepsilon\) with respect to our thresholds.

For binary outcomes, the predicted probability that \(y = 1\) was \(\mathtt{cdf}_{\mathtt{normal}}(\mathbf{x}\mathbf{\beta} - \tau)\), where \(\mathtt{cdf}_{\mathtt{normal}}()\) is the cumulative distribution function of the standard normal distribution.

For ordered outcomes, we can follow the same reasoning to consider \(\Pr(y > m)\), where \(m\) is one of our categories. Then, \(\Pr(y > m) = \mathtt{cdf}_{\mathtt{normal}}(\mathbf{x}\mathbf{\beta}-\tau_m)\).

To make the math complete, we will add extra thresholds past the ends of our latent continuum:

  • \(\tau_0\) at \(-\infty\) so that \(\Pr(y > 0) = 1\)
  • \(\tau_k\) at \(\infty\) so that \(\Pr(y > k) = 0\)

Given that we can calculate \(\Pr(y > m)\) for any outcome, we can then calculate \(\Pr(y=m)\) for any outcome using subtraction:

\[ \begin{align} \Pr(y=m) & = \Pr(y^\ast > \tau_{m-1}) - \Pr(y^\ast > \tau_m) \\ \\ & = \mathtt{cdf}_{\mathtt{normal}}( \mathbf{x}\mathbf{\beta} - \tau_{m-1}) - \mathtt{cdf}_{\mathtt{normal}}( \mathbf{x}\mathbf{\beta} - \tau_m) \end{align} \]

Example

Let’s demonstrate how predicted probabilities are calculated with an example. The General Social Survey starts with a battery of items asking respondents for their opinion about national spending in several domains. One is spending on “improving and protecting the nation’s health.” The response categories are: (1) too little, (2) about right, and (3) too much. We will consider this categorical variable as the observed manifestation of the underlying latent variable “opposition to government health care spending.”

We fit a simple model in which our explanatory variables are sex (measured as male/female) and political party (measured as Dem/Ind/Rep. We fit the model using the probit model for ordered responses, which in Stata can be fit using the \(\texttt{oprobit}\) command.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(haven)
library(MASS)

Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select
library(modelsummary)

# Read and prepare data
df <- read_dta("../dta/gss_natheal.dta") %>%
  filter(year >= 2010) %>%
  filter(as.integer(natheal) <= 3) %>%
  mutate(natheal = factor(as_factor(natheal),
                          levels = c("too little", "about right", "too much")),
         party = as_factor(party),
         male = as_factor(male)) %>%
  drop_na(party)

# Fit ordered probit model
model <- polr(natheal ~ party + male,
              data = df, Hess = TRUE, method = "probit")

# Display model results
modelsummary(model, gof_map = c("nobs"))
(1)
too little|about right 0.711
(0.029)
about right|too much 1.645
(0.034)
partyindep 0.200
(0.046)
partyrepublican 0.654
(0.037)
malemale 0.200
(0.033)
Num.Obs. 5564
. keep if year >= 2010 (14,927 observations deleted) . oprobit natheal i.party i.male, nolog Ordered probit regression Number of obs = 5,564 LR chi2(3) = 374.46 Prob > chi2 = 0.0000 Log likelihood = -4668.7381 Pseudo R2 = 0.0386 ------------------------------------------------------------------------------ natheal | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- party | indep | .1999182 .0459128 4.35 0.000 .1099308 .2899055 republican | .6536772 .0368562 17.74 0.000 .5814403 .725914 | male | male | .2002578 .0332625 6.02 0.000 .1350644 .2654512 -------------+---------------------------------------------------------------- /cut1 | .7109146 .0288818 .6543073 .7675219 /cut2 | 1.644838 .034059 1.578084 1.711593 ------------------------------------------------------------------------------

The values of \(\mathtt{/cut}\) in our output are the thresholds: our estimates are .711 for \(\tau_1\) and 1.645 for \(\tau_2\).

If we consider the specific case of a Republican man, the for that case is \(.654 + .200 = .854\).

\[ \begin{align} \Pr(y > 0) & = 1 \\ \\ \Pr(y > 1) & = \Pr(y > \mathrm{too\ little}) = .854 - .710 = .144 = .557 \\ \\ \Pr(y > 2) & = \Pr(y > \mathrm{about\ right}) = .854 - 1.645 = .214 \\ \\ \Pr(y > 3) & = \Pr(y > \mathrm{too\ much}) = 0 \\ \end{align} \]

From these, we can compute the probabilities for each of our three categories:

\[ \begin{align} \Pr(y = \mathrm{too\ little}) & = \Pr(y > 0) - \Pr(y > 1) \\ & = 1 - .557 \\ & = .443 \\ \\ \Pr(y = \mathrm{about\ right}) & = \Pr(y > 1) - \Pr(y > 2) \\ & = \mathtt{cdf}_{\mathtt{normal}}(\mathbf{x}\mathbf{\beta} - \tau_1) - \mathtt{cdf}_{\mathtt{normal}}(\mathbf{x}\mathbf{\beta} - t2) \\ & = .557 - .214 \\ & = .343 \\ \\ \Pr(y = \mathrm{too\ much}) & = \Pr(y > 2) - \Pr(y > 3) \\ & = \mathtt{cdf}_{\mathtt{normal}}(\mathbf{x}\mathbf{\beta} - \tau_2) - 0 \\ & = .214 - 0 \\ & = .214 \\ \end{align} \]

Note that the sum of the predicted probabilities for each of the categories \(.443 + .343 + .214 = 1\).

We can use \(\mathtt{margins}\) in Stata to have it confirm our calculations:

# Create new data for prediction: Republican man
newdata <- data.frame(
  party = factor("republican", levels = levels(df$party)),
  male = factor("male", levels = levels(df$male))
)

# Get predicted probabilities for all three outcome categories
predictions <- predict(model, newdata = newdata, type = "probs")

# Display probabilities
data.frame(
  too_little = predictions["too little"],
  about_right = predictions["about right"],
  too_much = predictions["too much"],
  row.names = NULL
)
  too_little about_right  too_much
1  0.4431372   0.3423621 0.2145007
. margins, at(party = 3 male = 1) noatlegend /// predict(outcome(1)) predict(outcome(2)) predict(outcome(3)) Adjusted predictions Number of obs = 5,564 Model VCE : OIM 1._predict : Pr(natheal==1), predict(outcome(1)) 2._predict : Pr(natheal==2), predict(outcome(2)) 3._predict : Pr(natheal==3), predict(outcome(3)) ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _predict | 1 | .443137 .012965 34.18 0.000 .4177261 .468548 2 | .3423628 .0080759 42.39 0.000 .3265342 .3581914 3 | .2145002 .0100464 21.35 0.000 .1948097 .2341907 ------------------------------------------------------------------------------

The ordered logit model

With the binary probit model, we said that we could have derived the binary logit model following the same reasoning with a couple twists. Specifically, instead of assuming the errors are normally distributed, we assumed they followed the logistic distribution, and instead of assuming the standard deviation of the error term is 1, we assume that it is \(\frac{\pi}{\sqrt{3}}\). We assume both these things because then and only then is the resulting model equivalent to what we developed earlier by writing a model in which the outcome was log odds.

If we make the same changes to the above model, we get what’s known as the ordered logit model instead of the ordered probit model. There’s a different way to get to the ordered logit model that is more directly in the spirit of logit as a log odds model, and so when we talk about it more it will be in that context.