Modeling an unordered outcome as a set of binary logits

We will illustrate the idea of modeling an unordered (aka nominal) outcome through the example of ideological self-identification as recoded into three categories: liberal, conservative, and moderate. While we might consider this an ordered outcome, here we have deliberately decided not to treat it as unordered.

One way we might think to approach the unordered data is to compare pairs of categories using the logit model that we already learned for binary outcomes. There are three unique pairs of categories:

For each pair, we can fit a binary logit model, excluding observations if they are not one of the categories in our pair (that is, when we fit the logit for liberals vs. conservatives, we exclude moderates). In this example, we will fit models with educational attainment and race/ethnicity as our explanatory variables.

Expand to show dependencies and data recoding
library(tidyverse)
library(haven)
library(texreg)

data <- read_dta("../dta/gss_soc383.dta") %>%
  mutate(ed3cat = factor(ed3cat,
                  levels = c(1, 2, 3),
                  labels = c("HS or less", "Some college", "College grad"))) %>%
  mutate(race4cat = factor(race4cat, 
                        levels = c(1, 2, 3, 4),
                        labels = c("White", "Black", "Hispanic", "Other"))) %>%
  mutate(race4cat = relevel(race4cat, ref = "White")) %>%
  mutate(liberal = ifelse(libcon3cat == 1, 1, ifelse(libcon3cat %in% c(2, 3), 0, NA))) %>%
  mutate(conserv = ifelse(libcon3cat == 3, 1, ifelse(libcon3cat %in% c(1, 2), 0, NA)))
Expand for code that fits models and displays results
# Run logistic regressions
# liberal vs. moderate
lib_v_mod <- glm(liberal ~ ed3cat + race4cat, 
                 data = data[data$conserv != 1, ], 
                 family = "binomial")

# conservative vs. moderate
con_v_mod <- glm(conserv ~ ed3cat + race4cat, 
                 data = data[data$liberal != 1, ], 
                 family = binomial(link = "logit"))

# liberal vs. conservative
lib_v_con <- glm(liberal ~ ed3cat + race4cat, 
                 data = data[(data$liberal == 1 | data$conserv == 1), ], 
                 family = binomial(link = "logit"))

# Create a table of results using screenreg from texreg
screenreg(list(lib_v_mod, con_v_mod, lib_v_con),
          custom.model.names = c("Lib vs Mod", "Con vs Mod", "Lib vs Con"),
          custom.coef.map = list("(Intercept)" = "Intercept",
                                  "ed3catSome college" = "Some college",
                                  "ed3catCollege grad" = "College grad",
                                  "race4catBlack" = "Black",
                                  "race4catHispanic" = "Hispanic",
                                  "race4catOther" = "Other race/eth"),
          stars = c(0.05, 0.01, 0.001),
          digits = 3,
          include.aic = FALSE,
          include.deviance = FALSE)

==============================================================
                Lib vs Mod      Con vs Mod      Lib vs Con    
--------------------------------------------------------------
Intercept           -0.716 ***      -0.238 ***      -0.505 ***
                    (0.017)         (0.015)         (0.018)   
Some college         0.326 ***       0.198 ***       0.165 ***
                    (0.026)         (0.025)         (0.028)   
College grad         0.935 ***       0.598 ***       0.384 ***
                    (0.027)         (0.026)         (0.026)   
Black                0.338 ***      -0.368 ***       0.736 ***
                    (0.031)         (0.032)         (0.034)   
Hispanic             0.196 ***      -0.286 ***       0.499 ***
                    (0.040)         (0.040)         (0.043)   
Other race/eth       0.083          -0.376 ***       0.475 ***
                    (0.062)         (0.064)         (0.066)   
--------------------------------------------------------------
BIC              48316.296       54630.818       45954.447    
Log Likelihood  -24126.633      -27283.616      -22945.931    
Num. obs.        36505           40040           33889        
==============================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
. quietly logit liberal i.ed3cat i.race4cat if conserv != 1 // liberal vs. moderate

. estimates store lib_v_mod

. quietly logit conserv i.ed3cat i.race4cat if liberal != 1 // conservative vs. moderate

. estimates store con_v_mod

. quietly logit liberal i.ed3cat i.race4cat if (liberal == 1 | conserv == 1) // lib vs. con

. estimates store lib_v_con

. estimates table lib_v_mod con_v_mod lib_v_con, equations(1) star(.05 .01 .001) b(%6.3f) stats(N)

-----------------------------------------------------
    Variable | lib_v_mod    con_v_mod    lib_v_con   
-------------+---------------------------------------
      ed3cat |
   some col  |   0.326***     0.198***     0.165***  
ba or above  |   0.935***     0.598***     0.384***  
             |
    race4cat |
      black  |   0.338***    -0.368***     0.736***  
     latino  |   0.196***    -0.286***     0.499***  
      other  |   0.083       -0.376***     0.475***  
             |
       _cons |  -0.716***    -0.238***    -0.505***  
-------------+---------------------------------------
           N |   36505        40040        33889     
-----------------------------------------------------
                legend: * p<.05; ** p<.01; *** p<.001

We have three sets of coefficients. We will use subscripts to identify which is which. Our three models are:

\[ \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{mod})}\right) = \mathbf{x\beta_{\textrm{LIB vs MOD}}} \]

\[ \ln\left(\frac{\Pr(y = \textrm{con})}{\Pr(y = \textrm{mod})}\right) = \mathbf{x\beta_{\textrm{CON vs MOD}}} \]

\[ \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) = \mathbf{x\beta_{\textrm{LIB vs CON}}} \]

The left sides of these equations can be arranged into an equation in their own right, namely:

\[ \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{mod})}\right) - \ln\left(\frac{\Pr(y = \textrm{con})}{\Pr(y = \textrm{mod})}\right) = \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) \]

Remember that \(\ln(a) - \ln(b) = \ln(frac{a}{b})\), so that we can arrange the left side and simplify:

\[ \begin{align} \ln\left[\frac{\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{mod})}\right)}{\left(\frac{\Pr(y = \textrm{con})}{\Pr(y = \textrm{mod})}\right)}\right] = \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) = \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) \end{align} \]

Given the equations for our three models above, this means:

\[ \mathbf{x\beta_{\textrm{LIB vs MOD}}} - \mathbf{x\beta_{\textrm{CON vs MOD}}} = \mathbf{x\beta_{\textrm{LIB vs CON}}} \]

In other words, even though we wrote three models above, the parameters from two of these models are sufficient to derive the parameters of the third.

Then again, if we look at the estimates from these models in the table, we will see that this does not exactly hold in our analysis. For example, if we plug in the coefficients for the “some college” variable, we get \(.326 - .198 = .165\), when in fact \(.326 - .198 = .128\). This isn’t even especially close. One way of understanding why this doesn’t hold exactly is to note that these three sets of estimates are estimated three overlapping but different samples.

The multinomial logit model that we will introduce can be thought of as like estimating the underlying logit for each pair of categories simultaneously, without separately estimating redundant coefficients so that the arithmetic relationships shown above for our parameters are also a feature of our estimates.

What if there are more than 3 categories?

Our approach involves first recognizing that there are potentially binary logits for each pair of categories. The number of unique pairs when there are k categories is \(\frac{k(k-1)}{2}\). As we saw with three categories, there were \(\frac{3(2)}{2}=3\) pairs. With four categories, there would be \(\frac{4(3)}{2}=6\) pairs. And with five there would be \(\frac{5(4)}{2}=10\) pairs.

If we have five categories labeled A, B, C, D, and E, the 10 pairs would be: A-B, A-C, A-D, A-E, B-C, B-D, B-E, C-D, C-E, and D-E.

In terms of non-redundant pairs, however, this is just \((k-1)\) pairs, or the number of categories minus 1. So in our example with 3 categories, there were 2 non-redundant logits and we could derive the 3rd logit from the results of the first two. With 4 categories, there are 6 possible logits but the results of 3 are enough to derive the rest. And with 5 categories, there are 10 possible logits, but we only need the results of 4 to derive the other 6.

Easiest here will be to select one of our categories to be the base category. In our example with 5 categories, say we choose category E as our base category. We will obtain results for the pairs A-E, B-E, C-E, and D-E, but then from those results we can derive the results for any other comparison.

Summary of key points

  1. One way we might think about models for an unordered outcome is as a set of contrasts between all pairs of categories of the outcome.
  2. If we do that, we can show arithmetically that some set(s) of parameters are simply the sum or difference between other set(s) of parameters.
  3. Even though these arithmetic relationships hold among parameters, they do not necessarily among parameter estimates when we estimate logits for different category-pairs separately.
  4. The multinomial logit model will solve this problem by effectively fitting all the underlying logits simultaneously so that redundant parameters are not separately estimated.
  5. If there are \(k\) categories, there are \(k-1\) non-redundant sets of parameters. The standard technique is to designate one outcome category to be the base category, and to estimate parameters for each other category vs. the base category.