Modeling an unordered outcome as a set of binary logits

We will illustrate the idea of modeling an unordered (aka nominal) outcome through the example of ideological self-identification as recoded into three categories: liberal, conservative, and moderate. While we might consider this an ordered outcome, here we have deliberately decided to treat it as unordered.

One approach is to compare pairs of categories using the logit model that we already learned for binary outcomes. There are three unique pairs of categories:

For each pair, we can fit a binary logit model, excluding observations if they are not one of the categories in our pair (that is, when we fit the logit for liberals vs. conservatives, we exclude moderates). In this example, we will fit models with educational attainment and race/ethnicity as our explanatory variables.

Expand to show dependencies and data recoding
library(tidyverse)
library(haven)
library(texreg)

data <- read_dta("../dta/gss_soc383.dta") %>%
  mutate(ed3cat = factor(ed3cat,
                  levels = c(1, 2, 3),
                  labels = c("HS or less", "Some college", "College grad"))) %>%
  mutate(race4cat = factor(race4cat, 
                        levels = c(1, 2, 3, 4),
                        labels = c("White", "Black", "Hispanic", "Other"))) %>%
  mutate(race4cat = relevel(race4cat, ref = "White")) %>%
  mutate(liberal = ifelse(libcon3cat == 1, 1, ifelse(libcon3cat %in% c(2, 3), 0, NA))) %>%
  mutate(conserv = ifelse(libcon3cat == 3, 1, ifelse(libcon3cat %in% c(1, 2), 0, NA)))
Expand for code that fits models and displays results
# Run logistic regressions
# liberal vs. moderate
lib_v_mod <- glm(liberal ~ ed3cat + race4cat, 
                 data = data[data$conserv != 1, ], 
                 family = "binomial")

# conservative vs. moderate
con_v_mod <- glm(conserv ~ ed3cat + race4cat, 
                 data = data[data$liberal != 1, ], 
                 family = binomial(link = "logit"))

# liberal vs. conservative
lib_v_con <- glm(liberal ~ ed3cat + race4cat, 
                 data = data[(data$liberal == 1 | data$conserv == 1), ], 
                 family = binomial(link = "logit"))

# Create a table of results using screenreg from texreg
screenreg(list(lib_v_mod, con_v_mod, lib_v_con),
          custom.model.names = c("Lib vs Mod", "Con vs Mod", "Lib vs Con"),
          custom.coef.map = list("(Intercept)" = "Intercept",
                                  "ed3catSome college" = "Some college",
                                  "ed3catCollege grad" = "College grad",
                                  "race4catBlack" = "Black",
                                  "race4catHispanic" = "Hispanic",
                                  "race4catOther" = "Other race/eth"),
          stars = c(0.05, 0.01, 0.001),
          digits = 3,
          include.aic = FALSE,
          include.deviance = FALSE)

==============================================================
                Lib vs Mod      Con vs Mod      Lib vs Con    
--------------------------------------------------------------
Intercept           -0.716 ***      -0.238 ***      -0.505 ***
                    (0.017)         (0.015)         (0.018)   
Some college         0.326 ***       0.198 ***       0.165 ***
                    (0.026)         (0.025)         (0.028)   
College grad         0.935 ***       0.598 ***       0.384 ***
                    (0.027)         (0.026)         (0.026)   
Black                0.338 ***      -0.368 ***       0.736 ***
                    (0.031)         (0.032)         (0.034)   
Hispanic             0.196 ***      -0.286 ***       0.499 ***
                    (0.040)         (0.040)         (0.043)   
Other race/eth       0.083          -0.376 ***       0.475 ***
                    (0.062)         (0.064)         (0.066)   
--------------------------------------------------------------
BIC              48316.296       54630.818       45954.447    
Log Likelihood  -24126.633      -27283.616      -22945.931    
Num. obs.        36505           40040           33889        
==============================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
. quietly logit liberal i.ed3cat i.race4cat if conserv != 1 // liberal vs. moderate

. estimates store lib_v_mod

. quietly logit conserv i.ed3cat i.race4cat if liberal != 1 // conservative vs. moderate

. estimates store con_v_mod

. quietly logit liberal i.ed3cat i.race4cat if (liberal == 1 | conserv == 1) // lib vs. con

. estimates store lib_v_con

. estimates table lib_v_mod con_v_mod lib_v_con, equations(1) star(.05 .01 .001) b(%6.3f) stats(N)

-----------------------------------------------------
    Variable | lib_v_mod    con_v_mod    lib_v_con   
-------------+---------------------------------------
      ed3cat |
   some col  |   0.326***     0.198***     0.165***  
ba or above  |   0.935***     0.598***     0.384***  
             |
    race4cat |
      black  |   0.338***    -0.368***     0.736***  
     latino  |   0.196***    -0.286***     0.499***  
      other  |   0.083       -0.376***     0.475***  
             |
       _cons |  -0.716***    -0.238***    -0.505***  
-------------+---------------------------------------
           N |   36505        40040        33889     
-----------------------------------------------------
                legend: * p<.05; ** p<.01; *** p<.001

We have three sets of coefficients, identified by subscripts. Our three models are:

\[ \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{mod})}\right) = \mathbf{x\beta_{\textrm{LIB vs MOD}}} \]

\[ \ln\left(\frac{\Pr(y = \textrm{con})}{\Pr(y = \textrm{mod})}\right) = \mathbf{x\beta_{\textrm{CON vs MOD}}} \]

\[ \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) = \mathbf{x\beta_{\textrm{LIB vs CON}}} \]

The left-hand sides of the first two equations combine to yield the third, namely:

\[ \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{mod})}\right) - \ln\left(\frac{\Pr(y = \textrm{con})}{\Pr(y = \textrm{mod})}\right) = \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) \]

Remember that \(\ln(a) - \ln(b) = \ln(\frac{a}{b})\), allowing us to rewrite the left side as:

\[ \ln\left[\frac{\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{mod})}\right)}{\left(\frac{\Pr(y = \textrm{con})}{\Pr(y = \textrm{mod})}\right)}\right] = \ln\left(\frac{\Pr(y = \textrm{lib})}{\Pr(y = \textrm{con})}\right) \]

Given the equations for our three models above, this means:

\[ \mathbf{x\beta_{\textrm{LIB vs MOD}}} - \mathbf{x\beta_{\textrm{CON vs MOD}}} = \mathbf{x\beta_{\textrm{LIB vs CON}}} \]

In other words, the parameters from any two of these models are sufficient to derive the third.

Then again, if we look at the estimates from these models in the table, we will see that this does not exactly hold in our analysis. For example, if we plug in the coefficients for the “some college” variable, the estimated LIB vs CON coefficient for “some college” is \(.165\), but \(.326 - .198 = .128\), which is not close. One way of understanding why this doesn’t hold exactly is to note that these three sets of estimates are estimated on three overlapping but different samples.

The multinomial logit model can be thought of as estimating all the underlying logits simultaneously, imposing the arithmetic relationships above as constraints on the estimates.

What if there are more than 3 categories?

Note that there are binary logits for each pair of categories. The number of unique pairs when there are k categories is \(\frac{k(k-1)}{2}\). As we saw with three categories, there were \(\frac{3(2)}{2}=3\) pairs. With four categories, there would be \(\frac{4(3)}{2}=6\) pairs. And with five there would be \(\frac{5(4)}{2}=10\) pairs.

If we have five categories labeled A, B, C, D, and E, the 10 pairs would be: A-B, A-C, A-D, A-E, B-C, B-D, B-E, C-D, C-E, and D-E.

In terms of non-redundant pairs, however, this is just \((k-1)\) pairs, or the number of categories minus 1. So in our example with 3 categories, there were 2 non-redundant logits and we could derive the 3rd logit from the results of the first two. With 4 categories, there are 6 possible logits but the results of 3 are enough to derive the rest. And with 5 categories, there are 10 possible logits, but we only need the results of 4 to derive the other 6.

The easiest approach is to select one category as the base category. In our example with 5 categories, say we choose category E as our base category. We will obtain results for the pairs A-E, B-E, C-E, and D-E, from which we can derive any other comparison.

Summary of key points

  1. One way we might think about models for an unordered outcome is as a set of contrasts between all pairs of categories of the outcome.
  2. If we do that, we can show arithmetically that some set(s) of parameters are simply the sum or difference between other set(s) of parameters.
  3. Even though these arithmetic relationships hold among parameters, they do not necessarily hold among parameter estimates when we estimate logits for different category-pairs separately.
  4. The multinomial logit model will solve this problem by effectively fitting all the underlying logits simultaneously so that redundant parameters are not separately estimated.
  5. If there are \(k\) categories, there are \(k-1\) non-redundant sets of parameters. The standard technique is to designate one outcome category to be the base category, and to estimate parameters for each other category vs. the base category.