The multinomial logit model

The multinomial logit model is the most commonly used model for unordered outcomes.

In the multinomial logit model, one category is chosen as the base category, which is arbitrary — the estimates obtained using one base category can be exactly recovered from estimates obtained using a different one.

Multiple contrasts, multiple sets of coefficients

The multinomial logit model has a different set of \(\mathbf{\beta}\) coefficients for each of the other categories. Each of those \(\mathbf{\beta}\) is like a binary logit of that category vs. the base category.

Consider our example in which the categories of the outcome are liberal, moderate, and conservative. We will use moderate as our base category. Our model has two equations and two sets of beta coefficients.

\[ \begin{align} \ln\frac{\Pr(y_i=\textrm{LIB})}{\Pr(y_i=\textrm{MOD})} & = \mathbf{x}_i \mathbf{\beta}_{\textrm{LIBvMOD}} \\ \\ \ln\frac{\Pr(y_i=\textrm{CON})}{\Pr(y_i=\textrm{MOD})} & = \mathbf{x}_i \mathbf{\beta}_{\textrm{CONvMOD}} \end{align} \]

Before interpreting results, let’s look at the output to clarify the two equations:

Expand to show dependencies and data recoding
library(tidyverse)
library(haven)
library(texreg)
library(modelsummary)
library(nnet)
library(tulaverse)

data <- read_dta("../dta/gss_soc383.dta") %>%
  mutate(ed3cat = factor(ed3cat,
                  levels = c(1, 2, 3),
                  labels = c("HS or less", "Some college", "College grad"))) %>%
  mutate(race4cat = factor(race4cat, 
                        levels = c(1, 2, 3, 4),
                        labels = c("White", "Black", "Hispanic", "Other"))) %>%
  mutate(race4cat = relevel(race4cat, ref = "White")) %>%
  mutate(libcon3cat = factor(libcon3cat, 
                        levels = c(1, 2, 3),
                        labels = c("lib", "mod", "con"))) %>%
  mutate(libcon3cat = relevel(libcon3cat, ref = "mod")) %>% # select base category
  mutate(liberal = ifelse(libcon3cat == "lib", 1, ifelse(libcon3cat %in% c("lib", "con"), 0, NA))) %>%
  mutate(conserv = ifelse(libcon3cat == "con", 1, ifelse(libcon3cat %in% c("lib", "mod"), 0, NA)))

In R, we fit the model using multinom() from the nnet library. Because I find the output from multinom() hard to understand, I will use the tula() function from my tulaverse package to reformat it.

model <- multinom(libcon3cat ~ ed3cat + race4cat + age, data = data, trace = FALSE)
tula(model)
AIC            = 117605.442                              Number of obs =   55052
BIC            = 117730.266                              McFadden R-sq = 0.02798
Log likelihood = -58788.721                                                     
────────────────────────────────────────────────────────────────────────────────
lib            │      Coef  Std. Err.          z     P>|z|  [95% Conf  Interval]
────────────────────────────────────────────────────────────────────────────────
ed3cat         │                                                                
  Some college │     .2881     .02661      10.83    <.0001       .236      .3403
  College grad │     .9223      .0271      34.04    <.0001      .8691      .9754
race4cat       │                                                                
  Black        │      .331     .03078      10.75    <.0001      .2707      .3913
  Hispanic     │     .1434     .04041      3.549     .0004     .06423      .2226
  Other        │    .06598     .06215      1.062     .2883    -.05582      .1878
age            │  -.008382   .0006463     -12.97    <.0001   -.009649   -.007116
(Intercept)    │    -.3212     .03505     -9.162    <.0001     -.3899     -.2524
────────────────────────────────────────────────────────────────────────────────
con            │      Coef  Std. Err.          z     P>|z|  [95% Conf  Interval]
────────────────────────────────────────────────────────────────────────────────
ed3cat         │                                                                
  Some college │     .2332     .02487      9.377    <.0001      .1845       .282
  College grad │     .6161     .02608      23.62    <.0001       .565      .6673
race4cat       │                                                                
  Black        │    -.3289     .03258     -10.09    <.0001     -.3928      -.265
  Hispanic     │    -.2192     .04082      -5.37    <.0001     -.2992     -.1392
  Other        │    -.3165     .06441     -4.914    <.0001     -.4427     -.1903
age            │   .006492   .0005826      11.14    <.0001    .005351    .007634
(Intercept)    │    -.5668     .03309     -17.13    <.0001     -.6317      -.502
────────────────────────────────────────────────────────────────────────────────
Base outcome: mod

I have also implemented an option parallel in tula() that presents coefficients for the different equations of a multinomial logit model side-by-side:

tula(model, parallel=TRUE)
AIC            = 117605.442              Number of obs =   55052
BIC            = 117730.266              McFadden R-sq = 0.02798
Log likelihood = -58788.721                                     
────────────────────────────────────────────────────────────────
libcon3cat     │      lib           con   
────────────────────────────────────────────────────────────────
ed3cat         │                          
  Some college │    .2881***      .2332***
               │  (.02661)      (.02487)  
  College grad │    .9223***      .6161***
               │   (.0271)      (.02608)  
race4cat       │                          
  Black        │     .331***     -.3289***
               │  (.03078)      (.03258)  
  Hispanic     │    .1434***     -.2192***
               │  (.04041)      (.04082)  
  Other        │   .06598        -.3165***
               │  (.06215)      (.06441)  
age            │ -.008382***    .006492***
               │(.0006463)    (.0005826)  
(Intercept)    │   -.3212***     -.5668***
               │  (.03505)      (.03309)  
────────────────────────────────────────────────────────────────
Base outcome: mod
* p<0.05  ** p<0.01  *** p<0.001

In this output, we have two sets of coefficients for the same explanatory variables. One is for the contrast of liberal vs. moderate and one is for conservative vs. moderate.

. mlogit libcon3cat i.ed3cat i.race4cat, base(2)

Iteration 0:   log likelihood = -60137.567  
Iteration 1:   log likelihood = -59225.436  
Iteration 2:   log likelihood =   -59221.8  
Iteration 3:   log likelihood =   -59221.8  

Multinomial logistic regression                 Number of obs     =     55,217
                                                LR chi2(10)       =    1831.53
                                                Prob > chi2       =     0.0000
Log likelihood =   -59221.8                     Pseudo R2         =     0.0152

------------------------------------------------------------------------------
  libcon3cat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
liberal      |
      ed3cat |
   some col  |      .3316   .0263259    12.60   0.000     .2800021    .3831978
ba or above  |   .9421803   .0269711    34.93   0.000     .8893179    .9950427
             |
    race4cat |
      black  |   .3577155   .0305936    11.69   0.000     .2977532    .4176778
     latino  |   .2063542   .0400243     5.16   0.000      .127908    .2848004
      other  |   .1004029   .0618792     1.62   0.105     -.020878    .2216839
             |
       _cons |  -.7230727   .0169032   -42.78   0.000    -.7562023   -.6899431
-------------+----------------------------------------------------------------
moderate     |  (base outcome)
-------------+----------------------------------------------------------------
conservative |
      ed3cat |
   some col  |   .1933512   .0245607     7.87   0.000     .1452131    .2414894
ba or above  |   .5903084   .0259155    22.78   0.000      .539515    .6411018
             |
    race4cat |
      black  |   -.355117   .0323966   -10.96   0.000    -.4186132   -.2916208
     latino  |  -.2738155   .0404162    -6.77   0.000    -.3530298   -.1946012
      other  |  -.3592157   .0640651    -5.61   0.000     -.484781   -.2336504
             |
       _cons |  -.2384181   .0148982   -16.00   0.000     -.267618   -.2092182
------------------------------------------------------------------------------

In this output, we have two sets of coefficients for the same explanatory variables. One is labeled “liberal” and one is labeled “conservative”. The coefficients labeled liberal are our \(\mathbf{\beta}_{\textrm{LIBvMOD}}\), and the coefficients labeled conservative are our \(\mathbf{\beta}_{\textrm{CONvMOD}}\).

As we discussed earlier, we do not estimate the contrast that doesn’t involve the base category (lib vs. con), but we can obtain it as simple subtraction:

\[ \mathbf{x}_i \mathbf{\beta}_{\textrm{LIBvCON}} = \mathbf{x}_i \mathbf{\beta}_{\textrm{LIBvMOD}} - \mathbf{x}_i \mathbf{\beta}_{\textrm{CONvMOD}} \]

For example, the coefficient of some college for the liberal vs. conservative contrast would be the difference between the corresponding lib vs. mod (.332) and con vs. mod (.193) coefficients, or \(.332 - .193 = .139\).

To make the math ultimately simpler, we will also note that if we consider the contrast of the base category with itself, the betas are 0. That is, in our example, if we consider the case in which: \[ \ln\left[\frac{\Pr(y_i=\textrm{MOD})}{\Pr(y_i=\textrm{MOD})}\right] = \mathbf{x}_i \mathbf{\beta}_{\textrm{MODvMOD}} \]

\(\frac{\Pr(y_i=\textrm{MOD})}{\Pr(y_i=\textrm{MOD})}\) is always 1. Therefore, \(\ln \left[\frac{\Pr(y_i=\textrm{MOD})}{\Pr(y_i=\textrm{MOD})}\right]\) is always 0, and all \(\beta_{\textrm{MODvMOD}} = 0\).

Predicted probabilities for the multinomial logit model

To estimate \(\mathbf{\beta}\) via maximum likelihood, we need to calculate the predicted probability of each outcome category given observed \(\mathbf{x}\) and candidate estimates of \(\mathbf{\beta}\).

The above equations provide everything we need. For example, say we want to calculate the probability of being liberal given \(\mathbf{x}_i\) and \(\mathbf{\beta}\).

\[ \begin{align} \Pr(y_i = \textrm{LIB}) & = \frac{\Pr(y_i = \textrm{LIB})}{1} \\ \\ & = \frac{\Pr(y_i = \textrm{LIB})}{\Pr(y_i = \textrm{LIB})+\Pr(y_i = \textrm{CON})+\Pr(y_i = \textrm{MOD})} \end{align} \] We can make the latter substitution above because the sum of the probabilities of all our categories is 1. If we now divide all the terms by \(\Pr(y_i = \textrm{MOD})\), we get:

\[ \Pr(y_i = \textrm{LIB}) = \frac{\frac{\Pr(y_i=\textrm{LIB})}{\Pr(y_i=\textrm{MOD})}}{\frac{\Pr(y_i=\textrm{LIB})}{\Pr(y_i=\textrm{MOD})}+\frac{\Pr(y_i=\textrm{CON})}{\Pr(y_i=\textrm{MOD})}+\frac{\Pr(y_i=\textrm{MOD})}{\Pr(y_i=\textrm{MOD})}} \]

Going back to the equations for the multinomial logit model, we can obtain different terms above by exponentiating the \(\mathbf{x}_i\mathbf{\beta}\). For example: \[ \frac{\Pr(y_i=\textrm{LIB})}{\Pr(y_i=\textrm{MOD})} = \exp\left(\mathbf{x}_i\mathbf{\beta}_\textrm{LIBvMOD}\right) \] If we make this substitution for all of the terms above, we get: \[ \Pr(y_i = \textrm{LIB}) = \frac{\exp\left(\mathbf{x}_i\mathbf{\beta}_\textrm{LIBvMOD}\right)}{\exp\left(\mathbf{x}_i\mathbf{\beta}_\textrm{LIBvMOD}\right)+\exp\left(\mathbf{x}_i\mathbf{\beta}_\textrm{CONvMOD}\right)+\exp\left(\mathbf{x}_i\mathbf{\beta}_\textrm{MODvMOD}\right)} \]

To make clear how this works, let’s take a hypothetical case in which we have \(\mathbf{x}\) and \(\mathbf{\beta}\) such that \(\mathbf{x}_i\mathbf{\beta}_\textrm{LIBvMOD} = 1\) and \(\mathbf{x}_i\mathbf{\beta}_\textrm{CONvMOD} = -1\). (And, as above, \(\mathbf{x}_i\mathbf{\beta}_\textrm{MODvMOD}\) is always 0.) In that case:

\[ \begin{align} \Pr(y_i = \textrm{LIB}) & = \frac{\exp(1)}{\exp(1)+\exp(-1)+\exp(0)} \\ \\ & = \frac{2.718}{2.718+.368+1} \\ \\ & = \frac{2.718}{4.086} \\ \\ & = .665 \end{align} \] That is, given \(\mathbf{x}_i\) and candidate \(\mathbf{\beta}\), the probability of liberal is .665. We can calculate the predicted probabilities for conservative and moderate by just changing which \(\beta\) is included in the numerator: \[ \begin{align} \Pr(y_i = \textrm{CON}) & = \frac{\exp(-1)}{\exp(1)+\exp(-1)+\exp(0)} \\ \\ & = \frac{.368}{2.718+.368+1} \\ \\ & = \frac{.368}{4.086} \\ \\ & = .090 \end{align} \] and: \[ \begin{align} \Pr(y_i = \textrm{MOD}) & = \frac{\exp(0)}{\exp(1)+\exp(-1)+\exp(0)} \\ \\ & = \frac{1}{2.718+.368+1} \\ \\ & = \frac{1}{4.086} \\ \\ & = .245 \end{align} \] Note that from the above, our three predicted probabilities .665 + .090 + .245 sum to 1, as they must.

To generalize what we have shown here, we can talk about the probability that \(y_i\) equals the \(j\)th of \(k\) categories, where \(b\) is used to indicate the base category:

\[\Pr(y_i=j|\mathbf{x}_i) = \frac{\exp(\mathbf{x}_i\mathbf{\beta}_{j\textrm{ vs. }b})}{\sum_{m=1}^k\left[\exp(\mathbf{x}_i\mathbf{\beta}_{m\textrm{ vs. }b})\right]}\]

To move now to an actual example given our output above, let’s consider a Latino individual with a college diploma.

\[ \begin{align} \mathbf{x}\mathbf{\beta}_\textrm{LIBvMOD} & = -.723 + .942 + .206 \\ \\ & = .425 \\ \mathbf{x}\mathbf{\beta}_\textrm{CONvMOD} & = -.238 + .590 + -.274 \\ & = .078 \end{align} \]

Because \(\mathbf{x}\mathbf{\beta}_\textrm{LIBvMOD}\) is greater than \(\mathbf{x}\mathbf{\beta}_\textrm{CONvMOD}\), the predicted probability of liberal will be greater than the predicted probability for conservative. Because both \(\mathbf{x}\mathbf{\beta}\) are greater than 0, the predicted probability of both liberal and conservative will be greater than that of moderate. More precisely:

\[ \begin{align} \Pr(y_i = \textrm{LIB}) & = \frac{\exp(.425)}{\exp(.425)+\exp(.078)+\exp(0)} \\ \\ & = \frac{1.530}{1.530+1.081+1} \\ \\ & = \frac{1.530}{3.611} \\ \\ & = \mathbf{.424} \\ \Pr(y_i = \textrm{CON}) & = \frac{1.081}{1.530+1.081+1} \\ \\ & = \frac{1.081}{3.611} \\ \\ & = \mathbf{.299} \\ \Pr(y_i = \textrm{MOD}) & = \frac{1}{1.530+1.081+1} \\ \\ & = \frac{1}{3.611} \\ \\ & = \mathbf{.277} \end{align} \]

Computing predicted probabilities with software

Here we show how to compute predicted probabilities for a specific set of values of our explanatory variables. In this example, we will figure out the predicted probability of liberal, moderate, or conservative identification in our data for someone who is 40 years old, Hispanic, and has a college degree.

We can use the predictions() function from the marginaleffects package:

library(marginaleffects)
pred <- predictions(
  model,
  newdata = datagrid(
    age = 40,
    ed3cat = "College grad",
    race4cat = "Hispanic"
  ),
  type = "probs"  # For predicted probabilities
) %>%
dplyr::select(-c(statistic, s.value, p.value))


pred

 Group Estimate Std. Error 2.5 % 97.5 %
   mod    0.278    0.00757 0.263  0.293
   lib    0.418    0.00951 0.400  0.437
   con    0.304    0.00845 0.287  0.320

We can use \(\mathtt{margins}\) to verify these calculations:

. margins, at(ed3cat = 3 race4cat = 3)

Adjusted predictions                                    Number of obs = 55,217
Model VCE: OIM

1._predict: Pr(libcon3cat==liberal), predict(pr outcome(1))
2._predict: Pr(libcon3cat==moderate), predict(pr outcome(2))
3._predict: Pr(libcon3cat==conservative), predict(pr outcome(3))

At: ed3cat   = 3
    race4cat = 3

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    _predict |
          1  |   .4237288   .0094967    44.62   0.000     .4051156     .442342
          2  |   .2768932   .0075399    36.72   0.000     .2621152    .2916712
          3  |   .2993779   .0083524    35.84   0.000     .2830076    .3157483
------------------------------------------------------------------------------

Interpreting exponentiated coefficients

As with the binary logit model, we interpret the exponentiated coefficients as multiplicative changes in odds.

In the case of the binary logit model, our interpretation was in terms of the change in the odds of \(y=1\) vs. \(y=0\), e.g., the odds of donating to a Republican or Democrat or the odds of identifying as a religious none (vs. not). In the multinomial logit model, the interpretation will be in terms of the change in odds of one category vs. the base category.

We will obtain the exponentiated coefficients for our models and then interpret them.

tula(model, parallel = TRUE, exp = TRUE)
AIC            = 117605.442              Number of obs =   55052
BIC            = 117730.266              McFadden R-sq = 0.02798
Log likelihood = -58788.721                                     
────────────────────────────────────────────────────────────────
libcon3cat     │      lib           con   
────────────────────────────────────────────────────────────────
ed3cat         │                          
  Some college │    1.334***      1.263***
               │   (.0355)       (.0314)  
  College grad │    2.515***      1.852***
               │  (.06815)       (.0483)  
race4cat       │                          
  Black        │    1.392***      .7197***
               │  (.04286)      (.02345)  
  Hispanic     │    1.154***      .8032***
               │  (.04665)      (.03278)  
  Other        │    1.068         .7287***
               │  (.06639)      (.04693)  
age            │    .9917***      1.007***
               │ (.000641)    (.0005864)  
(Intercept)    │    .7253***      .5673***
               │  (.02542)      (.01877)  
────────────────────────────────────────────────────────────────
Base outcome: mod
* p<0.05  ** p<0.01  *** p<0.001
. mlogit libcon3cat i.ed3cat i.race4cat i.male age, base(2) rrr Iteration 0: log likelihood = -24761.649 Iteration 1: log likelihood = -24244.513 Iteration 2: log likelihood = -24242.889 Iteration 3: log likelihood = -24242.889 Multinomial logistic regression Number of obs = 22,732 LR chi2(14) = 1037.52 Prob > chi2 = 0.0000 Log likelihood = -24242.889 Pseudo R2 = 0.0210 ------------------------------------------------------------------------------ libcon3cat | RRR Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- liberal | ed3cat | some col | 1.271069 .0530096 5.75 0.000 1.171305 1.37933 ba or above | 2.518641 .1031426 22.56 0.000 2.324385 2.729131 | race4cat | black | 1.119527 .0536883 2.35 0.019 1.019094 1.229858 latino | 1.143477 .0585587 2.62 0.009 1.034276 1.264208 other | 1.004676 .0823635 0.06 0.955 .8555479 1.179798 | male | male | 1.02389 .0346696 0.70 0.486 .9581446 1.094147 age | .9961363 .0010056 -3.83 0.000 .9941672 .9981093 _cons | .5760149 .0354104 -8.97 0.000 .51063 .6497721 -------------+---------------------------------------------------------------- moderate | (base outcome) -------------+---------------------------------------------------------------- conservative | ed3cat | some col | 1.137139 .0437095 3.34 0.001 1.054617 1.226117 ba or above | 1.596999 .0625332 11.96 0.000 1.479022 1.724388 | race4cat | black | .61077 .0304712 -9.88 0.000 .5538745 .6735098 latino | .7898457 .0399261 -4.67 0.000 .7153436 .8721071 other | .6259675 .0544497 -5.39 0.000 .5278495 .742324 | male | male | 1.242497 .0394429 6.84 0.000 1.167547 1.32226 age | 1.008699 .0009325 9.37 0.000 1.006873 1.010529 _cons | .4933145 .0286455 -12.17 0.000 .4402473 .5527783 ------------------------------------------------------------------------------

The exponentiated coefficients can be interpreted as the odds of the specified category versus moderate.

  • Compared to those with only high school education or less, having a BA degree increases the odds of being a liberal versus moderate by 151%, net of race, sex, and age. It increases the odds of being a conservative versus moderate by 60%.
  • Compared to Whites, Blacks have 12% higher odds of being liberal versus moderate, and 39% lower odds of being conservative versus moderate, adjusting for education, sex, and age.
  • Each additional year of age increases the odds of being conservative versus moderate by .8%, and decreases the odds of being liberal versus moderate by .4%.

Sources vary in their endorsement of the use of “odds” above. One could object to calling it an odds ratio because odds are typically defined as the probability of an outcome occurring over the probability of it not occurring (i.e., \(\Pr(y=1)/\Pr(y=0)\)).

For this reason, Stata refers to estimates from exponentiating multinomial logit coefficients as “relative risk ratios,” and so someone following this logic could substitute “relative risk” instead of “odds” in the interpretations above. However, this is confusing because “relative risk” usually refers to probabilities relative to one category of an explanatory variable vs. another, not (as in this case) one category of the outcome versus another. Accordingly I will just stick with “odds,” but the important point is that we are talking about the odds of one outcome category versus the base category (i.e., liberal vs. moderate) and not one category vs. everything else (i.e., liberal vs. not-liberal).