Modeling an unordered outcome as a set of binary logits
We will illustrate the idea of modeling an unordered (aka nominal) outcome through the example of ideological self-identification as recoded into three categories: liberal, conservative, and moderate. While we might consider this an ordered outcome, here we have deliberately decided to treat it as unordered.
One approach is to compare pairs of categories using the logit model that we already learned for binary outcomes. There are three unique pairs of categories:
Liberal vs. moderate
Conservative vs. moderate
Liberal vs. conservative
For each pair, we can fit a binary logit model, excluding observations if they are not one of the categories in our pair (that is, when we fit the logit for liberals vs. conservatives, we exclude moderates). In this example, we will fit models with educational attainment and race/ethnicity as our explanatory variables.
Expand for code that fits models and displays results
# Run logistic regressions# liberal vs. moderatelib_v_mod <-glm(liberal ~ ed3cat + race4cat, data = data[data$conserv !=1, ], family ="binomial")# conservative vs. moderatecon_v_mod <-glm(conserv ~ ed3cat + race4cat, data = data[data$liberal !=1, ], family =binomial(link ="logit"))# liberal vs. conservativelib_v_con <-glm(liberal ~ ed3cat + race4cat, data = data[(data$liberal ==1| data$conserv ==1), ], family =binomial(link ="logit"))# Create a table of results using screenreg from texregscreenreg(list(lib_v_mod, con_v_mod, lib_v_con),custom.model.names =c("Lib vs Mod", "Con vs Mod", "Lib vs Con"),custom.coef.map =list("(Intercept)"="Intercept","ed3catSome college"="Some college","ed3catCollege grad"="College grad","race4catBlack"="Black","race4catHispanic"="Hispanic","race4catOther"="Other race/eth"),stars =c(0.05, 0.01, 0.001),digits =3,include.aic =FALSE,include.deviance =FALSE)
==============================================================
Lib vs Mod Con vs Mod Lib vs Con
--------------------------------------------------------------
Intercept -0.716 *** -0.238 *** -0.505 ***
(0.017) (0.015) (0.018)
Some college 0.326 *** 0.198 *** 0.165 ***
(0.026) (0.025) (0.028)
College grad 0.935 *** 0.598 *** 0.384 ***
(0.027) (0.026) (0.026)
Black 0.338 *** -0.368 *** 0.736 ***
(0.031) (0.032) (0.034)
Hispanic 0.196 *** -0.286 *** 0.499 ***
(0.040) (0.040) (0.043)
Other race/eth 0.083 -0.376 *** 0.475 ***
(0.062) (0.064) (0.066)
--------------------------------------------------------------
BIC 48316.296 54630.818 45954.447
Log Likelihood -24126.633 -27283.616 -22945.931
Num. obs. 36505 40040 33889
==============================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
. quietly logit liberal i.ed3cat i.race4cat if conserv != 1 // liberal vs. moderate
. estimates store lib_v_mod
. quietly logit conserv i.ed3cat i.race4cat if liberal != 1 // conservative vs. moderate
. estimates store con_v_mod
. quietly logit liberal i.ed3cat i.race4cat if (liberal == 1 | conserv == 1) // lib vs. con
. estimates store lib_v_con
. estimates table lib_v_mod con_v_mod lib_v_con, equations(1) star(.05 .01 .001) b(%6.3f) stats(N)
-----------------------------------------------------
Variable | lib_v_mod con_v_mod lib_v_con
-------------+---------------------------------------
ed3cat |
some col | 0.326*** 0.198*** 0.165***
ba or above | 0.935*** 0.598*** 0.384***
|
race4cat |
black | 0.338*** -0.368*** 0.736***
latino | 0.196*** -0.286*** 0.499***
other | 0.083 -0.376*** 0.475***
|
_cons | -0.716*** -0.238*** -0.505***
-------------+---------------------------------------
N | 36505 40040 33889
-----------------------------------------------------
legend: * p<.05; ** p<.01; *** p<.001
We have three sets of coefficients, identified by subscripts. Our three models are:
Given the equations for our three models above, this means:
\[
\mathbf{x\beta_{\textrm{LIB vs MOD}}} - \mathbf{x\beta_{\textrm{CON vs MOD}}} = \mathbf{x\beta_{\textrm{LIB vs CON}}}
\]
In other words, the parameters from any two of these models are sufficient to derive the third.
Then again, if we look at the estimates from these models in the table, we will see that this does not exactly hold in our analysis. For example, if we plug in the coefficients for the “some college” variable, the estimated LIB vs CON coefficient for “some college” is \(.165\), but \(.326 - .198 = .128\), which is not close. One way of understanding why this doesn’t hold exactly is to note that these three sets of estimates are estimated on three overlapping but different samples.
The multinomial logit model can be thought of as estimating all the underlying logits simultaneously, imposing the arithmetic relationships above as constraints on the estimates.
What if there are more than 3 categories?
Note that there are binary logits for each pair of categories. The number of unique pairs when there are k categories is \(\frac{k(k-1)}{2}\). As we saw with three categories, there were \(\frac{3(2)}{2}=3\) pairs. With four categories, there would be \(\frac{4(3)}{2}=6\) pairs. And with five there would be \(\frac{5(4)}{2}=10\) pairs.
If we have five categories labeled A, B, C, D, and E, the 10 pairs would be: A-B, A-C, A-D, A-E, B-C, B-D, B-E, C-D, C-E, and D-E.
In terms of non-redundant pairs, however, this is just \((k-1)\) pairs, or the number of categories minus 1. So in our example with 3 categories, there were 2 non-redundant logits and we could derive the 3rd logit from the results of the first two. With 4 categories, there are 6 possible logits but the results of 3 are enough to derive the rest. And with 5 categories, there are 10 possible logits, but we only need the results of 4 to derive the other 6.
The easiest approach is to select one category as the base category. In our example with 5 categories, say we choose category E as our base category. We will obtain results for the pairs A-E, B-E, C-E, and D-E, from which we can derive any other comparison.
Summary of key points
One way we might think about models for an unordered outcome is as a set of contrasts between all pairs of categories of the outcome.
If we do that, we can show arithmetically that some set(s) of parameters are simply the sum or difference between other set(s) of parameters.
Even though these arithmetic relationships hold among parameters, they do not necessarily hold among parameter estimates when we estimate logits for different category-pairs separately.
The multinomial logit model will solve this problem by effectively fitting all the underlying logits simultaneously so that redundant parameters are not separately estimated.
If there are \(k\) categories, there are \(k-1\) non-redundant sets of parameters. The standard technique is to designate one outcome category to be the base category, and to estimate parameters for each other category vs. the base category.