Modeling an unordered outcome as a set of binary logits
We will illustrate the idea of modeling an unordered (aka nominal) outcome through the example of ideological self-identification as recoded into three categories: liberal, conservative, and moderate. While we might consider this an ordered outcome, here we have deliberately decided not to treat it as unordered.
One way we might think to approach the unordered data is to compare pairs of categories using the logit model that we already learned for binary outcomes. There are three unique pairs of categories:
Liberal vs. moderate
Conservative vs. moderate
Liberal vs. conservative
For each pair, we can fit a binary logit model, excluding observations if they are not one of the categories in our pair (that is, when we fit the logit for liberals vs. conservatives, we exclude moderates). In this example, we will fit models with educational attainment and race/ethnicity as our explanatory variables.
Expand for code that fits models and displays results
# Run logistic regressions# liberal vs. moderatelib_v_mod <-glm(liberal ~ ed3cat + race4cat, data = data[data$conserv !=1, ], family ="binomial")# conservative vs. moderatecon_v_mod <-glm(conserv ~ ed3cat + race4cat, data = data[data$liberal !=1, ], family =binomial(link ="logit"))# liberal vs. conservativelib_v_con <-glm(liberal ~ ed3cat + race4cat, data = data[(data$liberal ==1| data$conserv ==1), ], family =binomial(link ="logit"))# Create a table of results using screenreg from texregscreenreg(list(lib_v_mod, con_v_mod, lib_v_con),custom.model.names =c("Lib vs Mod", "Con vs Mod", "Lib vs Con"),custom.coef.map =list("(Intercept)"="Intercept","ed3catSome college"="Some college","ed3catCollege grad"="College grad","race4catBlack"="Black","race4catHispanic"="Hispanic","race4catOther"="Other race/eth"),stars =c(0.05, 0.01, 0.001),digits =3,include.aic =FALSE,include.deviance =FALSE)
==============================================================
Lib vs Mod Con vs Mod Lib vs Con
--------------------------------------------------------------
Intercept -0.716 *** -0.238 *** -0.505 ***
(0.017) (0.015) (0.018)
Some college 0.326 *** 0.198 *** 0.165 ***
(0.026) (0.025) (0.028)
College grad 0.935 *** 0.598 *** 0.384 ***
(0.027) (0.026) (0.026)
Black 0.338 *** -0.368 *** 0.736 ***
(0.031) (0.032) (0.034)
Hispanic 0.196 *** -0.286 *** 0.499 ***
(0.040) (0.040) (0.043)
Other race/eth 0.083 -0.376 *** 0.475 ***
(0.062) (0.064) (0.066)
--------------------------------------------------------------
BIC 48316.296 54630.818 45954.447
Log Likelihood -24126.633 -27283.616 -22945.931
Num. obs. 36505 40040 33889
==============================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
. quietly logit liberal i.ed3cat i.race4cat if conserv != 1 // liberal vs. moderate
. estimates store lib_v_mod
. quietly logit conserv i.ed3cat i.race4cat if liberal != 1 // conservative vs. moderate
. estimates store con_v_mod
. quietly logit liberal i.ed3cat i.race4cat if (liberal == 1 | conserv == 1) // lib vs. con
. estimates store lib_v_con
. estimates table lib_v_mod con_v_mod lib_v_con, equations(1) star(.05 .01 .001) b(%6.3f) stats(N)
-----------------------------------------------------
Variable | lib_v_mod con_v_mod lib_v_con
-------------+---------------------------------------
ed3cat |
some col | 0.326*** 0.198*** 0.165***
ba or above | 0.935*** 0.598*** 0.384***
|
race4cat |
black | 0.338*** -0.368*** 0.736***
latino | 0.196*** -0.286*** 0.499***
other | 0.083 -0.376*** 0.475***
|
_cons | -0.716*** -0.238*** -0.505***
-------------+---------------------------------------
N | 36505 40040 33889
-----------------------------------------------------
legend: * p<.05; ** p<.01; *** p<.001
We have three sets of coefficients. We will use subscripts to identify which is which. Our three models are:
Given the equations for our three models above, this means:
\[
\mathbf{x\beta_{\textrm{LIB vs MOD}}} - \mathbf{x\beta_{\textrm{CON vs MOD}}} = \mathbf{x\beta_{\textrm{LIB vs CON}}}
\]
In other words, even though we wrote three models above, the parameters from two of these models are sufficient to derive the parameters of the third.
Then again, if we look at the estimates from these models in the table, we will see that this does not exactly hold in our analysis. For example, if we plug in the coefficients for the “some college” variable, we get \(.326 - .198 = .165\), when in fact \(.326 - .198 = .128\). This isn’t even especially close. One way of understanding why this doesn’t hold exactly is to note that these three sets of estimates are estimated three overlapping but different samples.
The multinomial logit model that we will introduce can be thought of as like estimating the underlying logit for each pair of categories simultaneously, without separately estimating redundant coefficients so that the arithmetic relationships shown above for our parameters are also a feature of our estimates.
What if there are more than 3 categories?
Our approach involves first recognizing that there are potentially binary logits for each pair of categories. The number of unique pairs when there are k categories is \(\frac{k(k-1)}{2}\). As we saw with three categories, there were \(\frac{3(2)}{2}=3\) pairs. With four categories, there would be \(\frac{4(3)}{2}=6\) pairs. And with five there would be \(\frac{5(4)}{2}=10\) pairs.
If we have five categories labeled A, B, C, D, and E, the 10 pairs would be: A-B, A-C, A-D, A-E, B-C, B-D, B-E, C-D, C-E, and D-E.
In terms of non-redundant pairs, however, this is just \((k-1)\) pairs, or the number of categories minus 1. So in our example with 3 categories, there were 2 non-redundant logits and we could derive the 3rd logit from the results of the first two. With 4 categories, there are 6 possible logits but the results of 3 are enough to derive the rest. And with 5 categories, there are 10 possible logits, but we only need the results of 4 to derive the other 6.
Easiest here will be to select one of our categories to be the base category. In our example with 5 categories, say we choose category E as our base category. We will obtain results for the pairs A-E, B-E, C-E, and D-E, but then from those results we can derive the results for any other comparison.
Summary of key points
One way we might think about models for an unordered outcome is as a set of contrasts between all pairs of categories of the outcome.
If we do that, we can show arithmetically that some set(s) of parameters are simply the sum or difference between other set(s) of parameters.
Even though these arithmetic relationships hold among parameters, they do not necessarily among parameter estimates when we estimate logits for different category-pairs separately.
The multinomial logit model will solve this problem by effectively fitting all the underlying logits simultaneously so that redundant parameters are not separately estimated.
If there are \(k\) categories, there are \(k-1\) non-redundant sets of parameters. The standard technique is to designate one outcome category to be the base category, and to estimate parameters for each other category vs. the base category.