Interpreting estimates from the ordered probit model

The problem of direct interpretation

Ordered probit coefficients, on their own, do not have any accessible interpretation. The signs of the coefficient are meaningful and straightforward, and the p-values are also fine (given all the caveats one might make about p-values in any statistical model.

In the previous ordered probit example, the coefficient for Republican (vs. Democrat) was .65, for the outcome of opposition to government health care spending. The positive sign of that coefficient means that Republicans were more likely be opposed than Democrats. and the fact that the \(p < .001\) provides confidence that the true parameter is positively signed, implying that the difference between Republicans and Democrats isn’t simply an artifact of sampling error.

As for the .65 specifically, however, there isn’t really an accessible way of explaining it unless the audience was already comfortable talking in terms of “probits.”

My opinion: If all that you care about is the sign of the relationship and the p-value, one might just follow up fitting the ordered probit by fitting a linear regression, and if the results are the same in terms of the coefficient signs and substantively similar in p-values, one might consider just reporting the OLS results instead with a footnote saying that results from an ordered probit model are similar.

A fundamental issue for interpretating coefficients for a latent outcome is that a latent variable does not have a natural scale, but instead its scale is something provided by the analyst as a matter of convenience. In ordered probit, even if we believed our assumption of normally distributed error term had a strong substantive justification, the assumption that the standard deviation of the errors was 1 is purely for mathematical convenience: there is nothing about the substantive reality we are modeling that makes 1 any better than 2 (which would double all our coefficients) or any other positive number we might choose.

Standardization as a solution to the arbitrary scale of latent variables

A common solution when the scale of a variable is arbitrary is to standardize it. A standardized variable is one that has been scaled so that the standard deviation is 1. As a result, the scale of the variable can be meaningfully interpreted in terms of sample standard deviations.

This may be a little confusing, as we just noted that the probit model assumed that the standard deviation of the error is 1. We are not now talking about the standard deviation of the error term, but the standard deviation of the latent variable itself.

Aside - Standardizing vs. centering vs. normalizing: Sometimes when people talk about standardizing a variable they use it in a looser sense to mean simultaneously standardizing the variable (scaling so the standard deviation is 1) and centering it (scaling so the mean is 0). This is fine.

On the other hand, standardizing should not be confused with normalizing a variable, which involves transforming the variable in a non-linear way so that it better approximates a normal distribution. Standardizing a variable does not change the shape of its distribution, only its standard deviation.

For ordered probit, we can give our coefficients a meaningful scale by transforming them so that our latent outcome variable is standardized. Before we undertake that, however, let’s go over how standardized coefficients for observed variables work in linear regression.

Standardizing coefficients in linear regression

The General Social Survey recodes respondent prestige into an occupational prestige measure that varies between 16 and 80 units. It also includes WORDSUM, a 10-item vocabulary test. Let’s do a linear regression in which the latter is the outcome and the former is an explanatory variable.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(haven)

# Read GSS data and select relevant variables
df <- read_dta("../dta/gss_natheal.dta") %>%
  select(wordsum, prest = prestg10) %>%
  filter(!is.na(wordsum) & !is.na(prest))

# Fit linear regression model
model <- lm(wordsum ~ prest, data = df)
summary(model)


Call:
lm(formula = wordsum ~ prest, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.4265 -1.2044  0.0262  1.2525  5.1988 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4.077006   0.058761   69.38   <2e-16 ***
prest       0.045263   0.001282   35.31   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.908 on 12452 degrees of freedom
Multiple R-squared:  0.091, Adjusted R-squared:  0.09092 
F-statistic:  1247 on 1 and 12452 DF,  p-value: < 2.2e-16


. regress wordsum prest, noheader
------------------------------------------------------------------------------
     wordsum | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       prest |   .0452631    .001282    35.31   0.000     .0427502    .0477761
       _cons |   4.077006   .0587614    69.38   0.000     3.961824    4.192187
------------------------------------------------------------------------------

We could interpret the result as saying:

Each additional unit of occupational prestige is associated with an additional .045 word increase in average performance on the WORDSUM vocabulary test.

Both talking about units of occupational prestige and words on a vocabulary test are not very meaningful metrics, at least without other information. An alternative, then, would be to standardize both variables by dividing the variable by its standard deviation. We will call the new variables \(\texttt{prest_z}\) and \(\texttt{wordsum_z}\).

# Standardize both variables (divide by standard deviation)
df <- df %>%
  mutate(
    prest_z = prest / sd(prest, na.rm = TRUE),
    wordsum_z = wordsum / sd(wordsum, na.rm = TRUE)
  )

# Fit linear regression with standardized variables
model_z <- lm(wordsum_z ~ prest_z, data = df)
summary(model_z)


Call:
lm(formula = wordsum_z ~ prest_z, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7116 -0.6019  0.0131  0.6260  2.5983 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.037618   0.029368   69.38   <2e-16 ***
prest_z     0.301657   0.008544   35.31   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9535 on 12452 degrees of freedom
Multiple R-squared:  0.091, Adjusted R-squared:  0.09092 
F-statistic:  1247 on 1 and 12452 DF,  p-value: < 2.2e-16


. regress wordsum_z prest_z [pweight=wtssall], noheader
(sum of wgt is 12,435.3818015672)
------------------------------------------------------------------------------
             |               Robust
   wordsum_z | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     prest_z |   .2895587   .0098023    29.54   0.000     .2703446    .3087728
       _cons |   2.098963   .0341901    61.39   0.000     2.031945    2.165981
------------------------------------------------------------------------------

The coefficient may be interpreted as:

A standard deviation increase in occupational prestige is associated with a .29 standard deviation increase in WORDSUM vocabulary test score.

When both the explanatory variable and outcome are standardized, the resulting coefficient is called a standardized coefficient, or more descriptively a fully-standardized coefficient.

Instead of standardizing both the explanatory variable and the outcome, we may only want to standardize the outcome variable. When we standardize only the outcome variable, that is known as a y-standardized coefficient.

As an example of when one might use a \(y\)-standardized coefficient, in linear regression it generally makes no sense to standardize a binary or otherwise categorical explanatory variable, since one cannot talk about a “standard deviation change” in those variables. But it still might make sense to talk about the outcome in standard deviations.

Applying this to our running example, we might use respondent degree as an explanatory variable along with (standardized) score on the WORDSUM test as our outcome.

# Re-read data with degree variable and standardize wordsum outcome
df <- read_dta("../dta/gss_natheal.dta") %>%
  filter(!is.na(wordsum) & !is.na(degree)) %>%
  mutate(degree = droplevels(as_factor(degree)),
         wordsum_z = wordsum / sd(wordsum, na.rm = TRUE))

# Fit linear regression with degree as categorical predictor
model_y_std <- lm(wordsum_z ~ relevel(degree, ref = "high school"), data = df)
summary(model_y_std)


Call:
lm(formula = wordsum_z ~ relevel(degree, ref = "high school"), 
    data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7570 -0.4855  0.1073  0.6017  2.8047 

Coefficients:
                                                   Estimate Std. Error t value
(Intercept)                                         2.85894    0.01102 259.483
relevel(degree, ref = "high school")lt high school -0.71990    0.02420 -29.748
relevel(degree, ref = "high school")junior college  0.20373    0.02981   6.835
relevel(degree, ref = "high school")bachelor        0.59287    0.02129  27.851
relevel(degree, ref = "high school")graduate        0.89802    0.02732  32.874
                                                   Pr(>|t|)    
(Intercept)                                         < 2e-16 ***
relevel(degree, ref = "high school")lt high school  < 2e-16 ***
relevel(degree, ref = "high school")junior college 8.57e-12 ***
relevel(degree, ref = "high school")bachelor        < 2e-16 ***
relevel(degree, ref = "high school")graduate        < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8936 on 13019 degrees of freedom
Multiple R-squared:  0.2017,    Adjusted R-squared:  0.2015 
F-statistic: 822.5 on 4 and 13019 DF,  p-value: < 2.2e-16


. regress wordsum_z ib1.degree [pweight=wtssall], noheader allbaselevels
(sum of wgt is 12,427.2138048121)
---------------------------------------------------------------------------------
                |               Robust
      wordsum_z | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
----------------+----------------------------------------------------------------
         degree |
lt high school  |  -.6869593   .0322103   -21.33   0.000    -.7500965   -.6238222
   high school  |          0  (base)
junior college  |   .2041779   .0326799     6.25   0.000     .1401202    .2682356
      bachelor  |   .5615617   .0251043    22.37   0.000     .5123534    .6107701
      graduate  |   .8809865   .0311413    28.29   0.000     .8199448    .9420282
                |
          _cons |   2.925083   .0128755   227.18   0.000     2.899845     2.95032
---------------------------------------------------------------------------------

We can interpret the \(y\)-standardized coefficient for bachelor’s degree as:

Compared to having only a high school diploma, people with a bachelor’s degree score on average 1.25 standard deviations higher on the WORDSUM vocabulary test.

In this example, we estimated the \(y\)-standardized coefficient by standardizing the outcome variable first and then fitting the model on this standardized variable. However, if we had the unstandardized coefficients, we could convert them to standardized coefficients by dividing the coefficient by the standard deviation of the outcome.

\[ \beta_{YSTD} = \frac{\beta_{UNSTD}}{SD_{y\ast}} \]

\(y\)-standardized coefficients in the probit model

We will fit a model with the outcome being our opposition to health care spending measure and our explanatory variables are age, political party, and sex.

library(MASS)


Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select

# Read and prepare GSS data
df_gss <- read_dta("../dta/gss_natheal.dta") %>%
  filter(as.integer(natheal) <= 3) %>%
  mutate(natheal = factor(as_factor(natheal),
                          levels = c("too little", "about right", "too much")),
         party = as_factor(party),
         male = as_factor(male)) %>%
  drop_na(party, male, age)

# Fit ordered probit model
model_oprobit <- polr(natheal ~ party + male + age,
                      data = df_gss, Hess = TRUE, method = "probit")

summary(model_oprobit)

Call:
polr(formula = natheal ~ party + male + age, data = df_gss, Hess = TRUE, 
    method = "probit")

Coefficients:
                   Value Std. Error t value
partyindep      0.218207  0.0329313   6.626
partyrepublican 0.554578  0.0261570  21.202
malemale        0.197313  0.0236060   8.359
age             0.003402  0.0006712   5.068

Intercepts:
                       Value   Std. Error t value
too little|about right  1.0283  0.0388    26.4782
about right|too much    1.9917  0.0417    47.7814

Residual Deviance: 17845.10 
AIC: 17857.10


. oprobit natheal i.party i.male age [pweight=wtssall], nolog

Ordered probit regression                               Number of obs = 11,907
                                                        Wald chi2(4)  = 407.03
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -9086.4012                       Pseudo R2     = 0.0285

------------------------------------------------------------------------------
             |               Robust
     natheal | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       party |
      indep  |   .2172332   .0377365     5.76   0.000      .143271    .2911954
 republican  |    .537841   .0298012    18.05   0.000     .4794317    .5962503
             |
        male |
       male  |   .1807049   .0269325     6.71   0.000     .1279182    .2334916
         age |   .0024933   .0007737     3.22   0.001     .0009768    .0040097
-------------+----------------------------------------------------------------
       /cut1 |   .9560695   .0453463                      .8671924    1.044947
       /cut2 |    1.92185   .0484612                      1.826868    2.016832
------------------------------------------------------------------------------

To standardize coefficients in a latent variable model, we want to rescale our coefficients to what they would be if the standard deviation of our latent variable was 1. As noted, the probit model assumes that the standard deviation of the error term is 1, not the latent variable itself.

How do we calculate the standard deviation of the latent variable? First, we recall that the variance is the standard deviation squared. Then we make use of the handy property of variances (standard deviations squared) is that \(\mathrm{Var}(a+b) = \mathrm{Var}(a) + \mathrm{Var}(b)\) when \(a\) and \(b\) are independent.

In our latent variable formulation: \(y^\ast = \mathbf{x}\mathbf{\beta} + \varepsilon\).

Consequently: \(\mathrm{Var}(y^\ast) = \mathrm{Var}(\mathbf{x}\mathbf{\beta}+\varepsilon) = \mathrm{Var}(\mathbf{x}\mathbf{\beta}) + \mathrm{Var}(\varepsilon)\).

Since \(\mathrm{Var}(\varepsilon) = 1\) in the probit model, \(\mathrm{Var}(y^\ast) = \mathrm{Var}(\mathbf{x}\mathbf{\beta}) + 1\). We can call this the “untransformed” variance of \(y^\ast\), meaning \(y^\ast\) implied from the coefficients we estimate, in which the error variance is assumed to be 1.

We can obtain the variance of \(\mathbf{x}\mathbf{\beta}\) by generated predicted values of \(\mathbf{x}\mathbf{\beta}\) for each of our observations, and then computing the standard deviation of that and squaring the result.

# Generate linear predictions (x*beta) for each observation
# Note: model.matrix includes intercept column, but polr coefficients exclude it
xb <- model.matrix(model_oprobit)[, -1, drop = FALSE] %*% coef(model_oprobit)

# Calculate variance of x*beta
var_xb <- var(as.vector(xb))

# In probit, error variance is fixed at 1, so variance of latent y* is:
# Var(y*) = Var(x*beta) + Var(error) = var_xb + 1
var_y_latent <- var_xb + 1

# Standard deviation of latent outcome
sd_y_latent <- sqrt(var_y_latent)

# Y-standardized coefficients: divide coefficients by sd of latent y*
coef_y_std <- coef(model_oprobit) / sd_y_latent

# Display results
data.frame(
  coefficient = coef(model_oprobit),
  y_standardized = coef_y_std
)

                coefficient y_standardized
partyindep      0.218206636    0.210127255
partyrepublican 0.554577896    0.534043936
malemale        0.197313379    0.190007597
age             0.003401818    0.003275862

# Show the variance and sd values
cat("Variance of x*beta:", var_xb, "\n")

Variance of x*beta: 0.07837829

cat("Variance of latent y*:", var_y_latent, "\n")

Variance of latent y*: 1.078378

cat("SD of latent y*:", sd_y_latent, "\n")

SD of latent y*: 1.03845

Stata: Computing the variance of \(\mathbf{x}\mathbf{\beta}\) with and without weights.

If your model does not use probability weights, then you can compute the variance of \(\mathbf{x}\mathbf{\beta}\) just by generating predicted values of \(\mathbf{x}\mathbf{\beta}\), using \(\mathtt{summarize}\) to get the standard deviation, and squaring the result.


. quietly oprobit natheal i.party i.male age, nolog

. predict xb if e(sample), xb
(14,791 missing values generated)

. sum xb

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          xb |     11,907    .4770271    .2799612   .0612329   1.054654

. scalar sd_xb = r(sd)

. scalar v_xb = sd_xb ^ 2 // square the sd to get the variance

. display v_xb
.07837826

If your model does use probability weights, then you want the standard deviation to be based on these same weights. When data are weighted, the sample standard deviation no longer is an estimate of the population standard deviation unless these weights are taken into account. This makes the computation a bit more complicated, because Stata’s summarize command does not work for probability weights. (Stata’s rationale for this is explained here.)

Instead, you have to use Stata’s \(\mathtt{mean}\) command, which does accept weights and will then provide the standard deviation if you subsequently use the \(\mathtt{estat\ sd}\) command.


. quietly oprobit natheal i.party i.male age [pweight=wtssall], nolog

. predict xb if e(sample), xb
(14,791 missing values generated)

.
. mean xb [pweight=wtssall]

Mean estimation                         Number of obs = 11,907

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          xb |   .4249199   .0027411      .4195469    .4302929
--------------------------------------------------------------

. estat sd

-------------------------------------
             |       Mean   Std. dev.
-------------+-----------------------
          xb |   .4249199    .2656479
-------------------------------------

. scalar sd_xb = sd[1,1]

. scalar v_xb = sd_xb^2

. display v_xb // square the sd to get the variance
.07066651

For our example, the variance of \(\mathbf{x}\mathbf{\beta}\) is .0707. This means that the variance of \(y^*\) is 1.0707. And that, in turn, means that the standard deviation of \(y^*\) is 1.0347.

To get the y-standardized coefficients, then, we need to divide our coefficients by 1.0347. For example, the coefficient for Independent (vs. Democrat) estimated above was .217, then the \(y\)-standardized coefficient is .210. We could interpret this as:

Net of sex and age, being a political independent is associated with a .21 standard deviation increase in opposition to government health care spending, compared to being a Democrat.

Likewise, our untransformed coefficient of age (.0025) implies a \(y\)-standardized coefficient of (\(.0025/1.0347=.0024\)). We can interpret this as:

Net of sex and party identification, each additional year is associated with a .0024 standard deviation increase in opposition to government health care spending.

These examples are a little frustrating, because the \(y\)-standardized coefficients are not that different from the probit coefficients themselves, because we were dividing by 1.0347 and that is close to 1. In practice, the \(y\)-standardized coefficient for ordered probit will deviate more as the explanatory power of the model increases, because as explanatory power increases the variance of \(\mathbf{x}\mathbf{\beta}\) (and hence \(y^\ast\)) increases.

Stata: Obtaining \(y\)-standardized coefficients with \(\mathtt{listcoef}\). I have co-authored a Stata add-on command that computes \(y\)-standardized coefficients to make this easier. It is available if you install the \(\mathtt{spost13\_ado}\) package, which you can find online by typing \(\mathtt{findit\ spost13\_ado}\) in Stata.


. listcoef
note: pweights not compatible with summarize; weights treated as aweights.

 oprobit (N=11907): Unstandardized and standardized estimates

  Observed SD:  0.6193
    Latent SD:  1.0347

-------------------------------------------------------------------------------
             |         b        z    P>|z|    bStdX    bStdY   bStdXY     SDofX
-------------+-----------------------------------------------------------------
       party |
      indep  |    0.2172    5.757    0.000    0.086    0.210    0.083     0.395
 republican  |    0.5378   18.048    0.000    0.257    0.520    0.248     0.478
             |
        male |
       male  |    0.1807    6.710    0.000    0.090    0.175    0.087     0.497
         age |    0.0025    3.222    0.001    0.042    0.002    0.041    16.914
-------------------------------------------------------------------------------

The \(y\)-standardized coefficient is provided in the column labeled \(\mathtt{bStdY}\).