Conditional logit to fit multinomial logit with alternative-specific variables

In the multinomial logit model, each case in the data has the same set of possible unordered outcomes. If the outcome is a choice, one can think about the multinomial logit model as one in which each individual has the same set of alternatives, and chooses one among them.

A conditional logit model provides a different way of fitting a multinomial logit model. The multinomial logit approach we have already covered is simpler, and there is no reason to use conditional logit for an ordinary multinomial logit model.
But, the conditional logit model is more flexible. One way it is more flexible is that the conditional logit model provides a way of adding information that varies over the possible outcomes.

Here are some examples that illustrate what we mean by alternative-specific information:

The outcome is the mode of transportation an individual takes to work. Alternatives are car, bus, train. Our key explanatory variable is time. We expect that people would prefer transportation that would get them to work in less time. We want to estimate the effect of time, but the time for each alternative varies for each individual.
The outcome is the high school chosen by a student in a district that has eight public high schools and open enrollment among them. Our key explanatory variable is distance: we think that students might prefer high schools that are closer. We want to estimate the effect of distance, but the distance to each of the eight high schools varies for each student.
The dependent variable is which candidate a respondent votes for in a multiparty election with a parliamentary system. For example, Canadian elections (as of 2025) have five major parties: the Liberal Party, the Conservative Party, Bloc Québécois, the New Democratic Party, and the Green Party. Our key explanatory variable is agreement with a party’s issue positions. We can ask a voter’s position on many issues, and combine them with each party’s position on those issues. We want to estimate how much overall agreement across many issues affects vote choice, but agreement with each party varies for each respondent.

When a conditional logit model is used to fit the equivalent of a multinomial logit model with alternative-specific variables, this is sometimes called a McFadden’s choice model.

Description of example

The example we will use involves a survey conducted in 1989 among people who had engaged in a fishing trip in Southern California. For each trip, there were four possible options:

Fishing from a beach
Fishing off a pier
Using a commercial charter boat
Using a private boat

For each trip, each of these options has an associated price (in dollars) and an associated quality (in terms of the expected number of target fish caught). These are alternative-specific variables. In addition, the data include a measure of monthly income for the respondent (measured in thousands of dollars), which is a case-specific variable.

Data set-up for conditional logit model

When we have alternative-specific data, we want the data arranged so that each alternative is on a different row, with some id variable identifying which rows belong to the same case. A binary variable should indicate which of the alternatives is the selected outcome.

For the fishing data, there are four alternatives, so each case will have four rows. We will list the first few cases here.

Expand to show dependencies and code for opening and recoding data

library(tidyverse)
library(haven)
library(marginaleffects)

data <- read_dta("../dta/fishing.dta") %>%
  mutate(alternative = as_factor(alternative, levels="label"),
         alternative = relevel(alternative, ref = "pier"),
         choice = as_factor(choice, levels="label")) %>%
  rename(alt = alternative) %>%
  arrange(id, alt, choice, chosen)

head(data %>% select(id, alt, chosen, price, quality, income), n = 12)

# A tibble: 12 × 6
      id alt          chosen price quality income
   <dbl> <fct>         <dbl> <dbl>   <dbl>  <dbl>
 1     1 pier              0 158.   0.0503   7.08
 2     1 beach             0 158.   0.0678   7.08
 3     1 private boat      0 158.   0.260    7.08
 4     1 charter boat      1 183.   0.539    7.08
 5     2 pier              0  15.1  0.0451   1.25
 6     2 beach             0  15.1  0.105    1.25
 7     2 private boat      0  10.5  0.157    1.25
 8     2 charter boat      1  34.5  0.467    1.25
 9     3 pier              0 162.   0.452    3.75
10     3 beach             0 162.   0.533    3.75
11     3 private boat      1  24.3  0.241    3.75
12     3 charter boat      0  59.3  1.03     3.75


. use ../dta/fishing.dta, clear
(Data from fishing survey by Thomson and Crooke (1991))

. list id alternative chosen price quality income in 1/12, sep(4)

     +-----------------------------------------------------------+
     | id    alternative   chosen     price   quality     income |
     |-----------------------------------------------------------|
  1. |  1          beach        0    157.93     .0678   7.083332 |
  2. |  1   charter boat        1    182.93     .5391   7.083332 |
  3. |  1           pier        0    157.93     .0503   7.083332 |
  4. |  1   private boat        0    157.93     .2601   7.083332 |
     |-----------------------------------------------------------|
  5. |  2          beach        0    15.114     .1049       1.25 |
  6. |  2   charter boat        1    34.534     .4671       1.25 |
  7. |  2           pier        0    15.114     .0451       1.25 |
  8. |  2   private boat        0    10.534     .1574       1.25 |
     |-----------------------------------------------------------|
  9. |  3          beach        0   161.874     .5333       3.75 |
 10. |  3   charter boat        0    59.334    1.0266       3.75 |
 11. |  3           pier        0   161.874     .4522       3.75 |
 12. |  3   private boat        1    24.334     .2413       3.75 |
     +-----------------------------------------------------------+

Above, the variable \(\texttt{id}\) identifies the individual cases. The variable \(\texttt{alternative}\) indicates the alternative corresponding to each row. \(\texttt{chosen}\) indicates which alternative was used; for example, for the first observation, the trip was by charter boat.

\(\texttt{price}\) and \(\texttt{quality}\) are alternative-specific variables and vary across alternatives. The variable \(\texttt{income}\) is case-specific and we can see that it is the same within each case.

Stata: rearranging data with alternative-specific values. You may have data with alternative-specific variables in which the information is all in a single row, with different variables containing the different alternative-specific values for a given measure. The \(\texttt{reshape long}\) command is what you use to re-arrange this data so that each row represents a different alternative.

Writing the conditional logit model

The conditional logit model can be written as:

\[ \begin{equation} u_{ij}=\mathbf{x}_{ij}^{A}\mathbf{\beta }^{A}+\mathbf{x}_{i}^{C}% \mathbf{\beta }_{j \mathrm{\ vs\ } base}^{C}+\varepsilon _{ij} \end{equation} \]

where:

\(u_{ij}\) is the utility of alternative \(j\) for individual \(i\).
The alternative-specific variables are indicated by \(\mathbf{x}_{ij}^{A}\), and, in the simple formulation, each alternative-specific variable will have a \(\mathbf{\beta }\) that does not vary over the alternatives.
The case-specific variables are indicated by \(\mathbf{x}_{i}^{C}\). As in multinomial logit, there will be a base category; each case-specific variable will have coefficients for each other category that is defined in terms of its contrast with the base category.
Because the model is written in terms of a latent utility, there is an error term, for each alternative for each individual. These are assumed to be independent of one another.

The predicted probability of category \(m\) being the chosen category is then calculated as:

\[ \begin{equation} \Pr \left( y=m\mid \mathbf{x}\right) =\frac{\exp \left( \mathbf{x}_{m}^{A}% \mathbf{\beta }^{A}+\mathbf{x}^{C}\mathbf{\beta }_{m \mathrm{\ vs\ } base}^{C}\right) }{\sum_{j=1}^{k}\exp \left( \mathbf{x}_{j}^{A}\mathbf{\beta }^{A}+% \mathbf{x}^{C}\mathbf{\beta }_{j \mathrm{\ vs\ } base}^{C}\right) } \end{equation} \]

This is like with multinomial logit, where for each observation there is a separate \(\mathbf{x\beta}\) that can be computed for each alternative. We can then sum the \(\exp(\mathbf{x\beta})\) over each alternative and this is the denominator of our predicted probability, where the \(\exp(\mathbf{x\beta})\) for alternative \(m\) is the numerator for calculating \(\Pr(y=m)\).

Fitting the conditional logit model

In R, we will fit the model using the clogit() function from the survival package.

The outcome variable is chosen: the binary variable that is 1 for the option that was selected
The alternative-specific variables price and quality are included in the model in the familiar way
The case-specific variable income is included as an interaction with the variable that identifies each of the alternatives (alt)
The variable that identifies the cases (id) is included in the model as strata(id).

library(survival)
model <- clogit(chosen ~ price + quality + (alt * income) + strata(id), data = data)
summary(model)

Call:
coxph(formula = Surv(rep(1, 4728L), chosen) ~ price + quality + 
    (alt * income) + strata(id), data = data, method = "exact")

  n= 4728, number of events= 1182 

                            coef exp(coef)  se(coef)       z Pr(>|z|)    
price                  -0.025117  0.975196  0.001732 -14.504  < 2e-16 ***
quality                 0.357782  1.430154  0.109773   3.259 0.001117 ** 
altbeach               -0.777959  0.459342  0.220494  -3.528 0.000418 ***
altprivate boat        -0.250681  0.778271  0.203940  -1.229 0.219000    
altcharter boat         0.916406  2.500289  0.207265   4.421 9.81e-06 ***
income                        NA        NA  0.000000      NA       NA    
altbeach:income         0.127577  1.136073  0.050640   2.519 0.011758 *  
altprivate boat:income  0.217017  1.242365  0.050058   4.335 1.46e-05 ***
altcharter boat:income  0.094285  1.098873  0.050060   1.883 0.059640 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                       exp(coef) exp(-coef) lower .95 upper .95
price                     0.9752     1.0254    0.9719    0.9785
quality                   1.4302     0.6992    1.1533    1.7735
altbeach                  0.4593     2.1770    0.2982    0.7077
altprivate boat           0.7783     1.2849    0.5218    1.1607
altcharter boat           2.5003     0.4000    1.6656    3.7533
income                        NA         NA        NA        NA
altbeach:income           1.1361     0.8802    1.0287    1.2546
altprivate boat:income    1.2424     0.8049    1.1263    1.3704
altcharter boat:income    1.0989     0.9100    0.9962    1.2122

Concordance= 0.757  (se = 0.011 )
Likelihood ratio test= 846.9  on 8 df,   p=<2e-16
Wald test            = 322.7  on 8 df,   p=<2e-16
Score (logrank) test = 596  on 8 df,   p=<2e-16

In the output above, you’ll notice the coefficient for income is indicated as “NA.” However, there are different coefficients for income for each of the three alternatives (vs. the base category), just like in multinomial logit.

In the conditional logit model, coefficients are only estimated for terms that vary within a case (in this case, within the same value of id). Income does not vary within a person. However, when we include the interaction terms, we implicitly refer to a term that does vary within a person: income \(\times\) beach, for example, equals the person’s income for their beach row and is 0 for all other rows.

We will reformat the output to be easier to read:

Expand to show code that reformats output

library(modelsummary)
coef_rename <- c(
  "price" = "Price of option",
  "quality" = "Expected catch",
  "altbeach:income" = "Income: Beach",
  "altprivate boat:income" = "Income: Private boat",
  "altcharter boat:income" = "Income: Charter boat",
  "altbeach" = "Intercept: Beach",
  "altprivate boat" = "Intercept: Private boat",
  "altcharter boat" = "Intercept: Charter boat")

# Create the formatted table with renamed coefficients
modelsummary(model, 
             stars = TRUE,
             estimate = "{estimate} ({std.error}){stars}",
             statistic = NULL,
             coef_map = coef_rename,  # Apply the coefficient renaming
             include.LogLike = TRUE)

Model matrix is rank deficient. Parameters `income` were not estimable.

	(1)
Price of option	-0.025 (0.002)***
Expected catch	0.358 (0.110)**
Income: Beach	0.128 (0.051)*
Income: Private boat	0.217 (0.050)***
Income: Charter boat	0.094 (0.050)+
Intercept: Beach	-0.778 (0.220)***
Intercept: Private boat	-0.251 (0.204)
Intercept: Charter boat	0.916 (0.207)***
Num.Obs.	4728
AIC	2446.3
BIC	2498.0
RMSE	0.39

In Stata, fitting a conditional logit model for simple choice data is done in two parts. First, the \(\texttt{cmcset}\) command is used to specify two things: (1) what variable identifies each observation and (2) what variable identifies each alternative within each observation. In our example these variables are conveniently named \(\mathtt{id}\) and \(\mathtt{alternative}\).


. cmset id alternative

     Case ID variable: id
Alternatives variable: alternative

Then, we estimate the model using the command cmclogit, using “pier” as our base category:


. cmclogit chosen price quality, casevar(income) base(2)

Iteration 0:   log likelihood = -1270.0164  
Iteration 1:   log likelihood = -1217.7258  
Iteration 2:   log likelihood = -1215.1499  
Iteration 3:   log likelihood = -1215.1376  
Iteration 4:   log likelihood = -1215.1376  

Conditional logit choice model                 Number of obs      =      4,728
Case ID variable: id                           Number of cases    =       1182

Alternatives variable: alternative             Alts per case: min =          4
                                                              avg =        4.0
                                                              max =          4

                                                  Wald chi2(5)    =     252.98
Log likelihood = -1215.1376                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
      chosen | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
alternative  |
       price |  -.0251166   .0017317   -14.50   0.000    -.0285106   -.0217225
     quality |    .357782   .1097733     3.26   0.001     .1426302    .5729337
-------------+----------------------------------------------------------------
beach        |
      income |   .1275771   .0506395     2.52   0.012     .0283255    .2268288
       _cons |  -.7779594   .2204939    -3.53   0.000     -1.21012   -.3457992
-------------+----------------------------------------------------------------
pier         |  (base alternative)
-------------+----------------------------------------------------------------
private_boat |
      income |    .217017   .0500582     4.34   0.000     .1189047    .3151293
       _cons |  -.2506806   .2039395    -1.23   0.219    -.6503948    .1490336
-------------+----------------------------------------------------------------
charter_boat |
      income |   .0942854     .05006     1.88   0.060    -.0038303    .1924012
       _cons |   .9164063   .2072648     4.42   0.000     .5101748    1.322638
------------------------------------------------------------------------------

The signs of these results mean that the price of an option is negatively associated with it being chosen, while an option’s quality is positively associated with being chosen. Unsurprisingly, customers are attracted to lower prices and higher quality.

Income is positively associated with choosing any option versus fishing off a pier, but most strongly with fishing by private boat.

The signs of the intercepts mean that, net of everything else, fishing by charter boat is the most popular choice, while fishing off the beach is least popular.

Interpreting conditional logit coefficients

As a logit model, we can exponentiate conditional logit coefficients and interpret those.

Expand to show code that provides output with exponentiated coefficients

library(modelsummary)
coef_rename <- c(
  "price" = "Price of option",
  "quality" = "Expected catch",
  "altbeach:income" = "Income: Beach",
  "altprivate boat:income" = "Income: Private boat",
  "altcharter boat:income" = "Income: Charter boat",
  "altbeach" = "Intercept: Beach",
  "altprivate boat" = "Intercept: Private boat",
  "altcharter boat" = "Intercept: Charter boat")

# Create the formatted table with renamed coefficients
modelsummary(model, 
             exponentiate=TRUE,
             stars = TRUE,
             estimate = "{estimate} ({std.error}){stars}",
             statistic = NULL,
             coef_map = coef_rename,  # Apply the coefficient renaming
             include.LogLike = TRUE)

Model matrix is rank deficient. Parameters `income` were not estimable.

	(1)
Price of option	0.975 (0.002)***
Expected catch	1.430 (0.157)**
Income: Beach	1.136 (0.058)*
Income: Private boat	1.242 (0.062)***
Income: Charter boat	1.099 (0.055)+
Intercept: Beach	0.459 (0.101)***
Intercept: Private boat	0.778 (0.159)
Intercept: Charter boat	2.500 (0.518)***
Num.Obs.	4728
AIC	2446.3
BIC	2498.0
RMSE	0.39

These can be obtained in Stata using the \(\texttt{or}\) option.


. cmclogit chosen price quality, casevar(income) or base(2)

Iteration 0:   log likelihood = -1270.0164  
Iteration 1:   log likelihood = -1217.7258  
Iteration 2:   log likelihood = -1215.1499  
Iteration 3:   log likelihood = -1215.1376  
Iteration 4:   log likelihood = -1215.1376  

Conditional logit choice model                 Number of obs      =      4,728
Case ID variable: id                           Number of cases    =       1182

Alternatives variable: alternative             Alts per case: min =          4
                                                              avg =        4.0
                                                              max =          4

                                                  Wald chi2(5)    =     252.98
Log likelihood = -1215.1376                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
      chosen | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
alternative  |
       price |   .9751962   .0016887   -14.50   0.000      .971892    .9785117
     quality |   1.430154   .1569927     3.26   0.001     1.153303    1.773462
-------------+----------------------------------------------------------------
beach        |
      income |   1.136073   .0575302     2.52   0.012      1.02873    1.254615
       _cons |   .4593424   .1012822    -3.53   0.000     .2981616    .7076546
-------------+----------------------------------------------------------------
pier         |  (base alternative)
-------------+----------------------------------------------------------------
private_boat |
      income |   1.242365   .0621906     4.34   0.000     1.126263    1.370436
       _cons |   .7782709   .1587202    -1.23   0.219     .5218397    1.160712
-------------+----------------------------------------------------------------
charter_boat |
      income |   1.098873   .0550096     1.88   0.060      .996177    1.212157
       _cons |   2.500289    .518222     4.42   0.000     1.665582    3.753309
------------------------------------------------------------------------------
Note: Exponentiated coefficients represent odds ratios for alternative-specific variables (first equation) and
      relative-risk ratios for case-specific variables.
Note: _cons estimates baseline relative risk for each outcome.

As the note at the bottom of the Stata output indicates, the results are an unusual hybrid of odds ratios (for the alternative-specific variables) and relative risk ratios (for the case-specific variables).

The coefficients for the alternative-specific variables can be interpreted as:

Net of personal income and the catch rate, each additional dollar of price is associated with a 2.5% decrease in the odds of an option being chosen.
Net of personal income and price, each unit increase in the catch rate is associated with a 43% increase in the odds of an option being chosen.

The coefficients for the case-specific variables can be interpreted as:

Net of price and quality, a one-thousand dollar increase in monthly income increases the odds of using a private boat versus fishing from a pier by 24%.

Consistent with its terminology with multinomial logit, Stata refers to the exponentiated coefficients for case-specific variables as relative risk ratios. As before, we do not do this because it invites confusion with the more usual use of “relative risk.”

Interpreting conditional logit results via average marginal change

Presently, I have only implemented this for Stata, and have not figured out how to do it in R.

For the case-specific variables, the average marginal change can be obtained in Stata using \(\texttt{margins}\) with the \(\texttt{dydx()}\) option:


. margins, dydx(income)

Average marginal effects                                 Number of obs = 4,728
Model VCE: OIM

Expression: Pr(alternative|1 selected), predict()
dy/dx wrt:  income

-------------------------------------------------------------------------------
              |            Delta-method
              |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
income        |
     _outcome |
       beach  |   .0034878    .003541     0.98   0.325    -.0034524     .010428
        pier  |  -.0144069    .004369    -3.30   0.001    -.0229701   -.0058437
private boat  |   .0266822     .00515     5.18   0.000     .0165885    .0367759
charter boat  |  -.0157631     .00559    -2.82   0.005    -.0267193   -.0048069
-------------------------------------------------------------------------------

From these results, we can see that, on average, a marginal increase in income increases the likelihood of choosing to fish by private boat by .027, and that this increase comes as the result of a decrease in the probability of fishing via charter boat of .016 and of fishing from a pier of .014.

For the alternative-specific variables, the Stata output is more complex to interpret. Here are the average marginal effects for price:


. margins, dydx(price)

Average marginal effects                                 Number of obs = 4,728
Model VCE: OIM

Expression: Pr(alternative|1 selected), predict()
dy/dx wrt:  price

--------------------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
---------------------------+----------------------------------------------------------------
price                      |
      _outcome#alternative |
              beach#beach  |  -.0021102   .0001727   -12.22   0.000    -.0024487   -.0017717
               beach#pier  |   .0009075   .0000997     9.10   0.000     .0007121    .0011029
       beach#private boat  |   .0005558   .0000458    12.14   0.000      .000466    .0006455
       beach#charter boat  |   .0006469   .0000563    11.50   0.000     .0005366    .0007572
               pier#beach  |   .0009075   .0000997     9.10   0.000     .0007121    .0011029
                pier#pier  |  -.0025593    .000177   -14.46   0.000    -.0029062   -.0022123
        pier#private boat  |   .0007361   .0000543    13.55   0.000     .0006296    .0008425
        pier#charter boat  |   .0009157    .000067    13.67   0.000     .0007844     .001047
       private boat#beach  |   .0005558   .0000458    12.14   0.000      .000466    .0006455
        private boat#pier  |   .0007361   .0000543    13.55   0.000     .0006296    .0008425
private boat#private boat  |  -.0050457   .0003323   -15.18   0.000     -.005697   -.0043944
private boat#charter boat  |   .0037539   .0002957    12.70   0.000     .0031743    .0043334
       charter boat#beach  |   .0006469   .0000563    11.50   0.000     .0005366    .0007572
        charter boat#pier  |   .0009157    .000067    13.67   0.000     .0007844     .001047
charter boat#private boat  |   .0037539   .0002957    12.70   0.000     .0031743    .0043334
charter boat#charter boat  |  -.0053165   .0003461   -15.36   0.000    -.0059948   -.0046381
--------------------------------------------------------------------------------------------

In this output, a row is labeled by a pair of alternatives separated by \(\texttt{\#}\). The alternative before the \(\texttt{\#}\) is the alternative for which the price is being increased by the marginal amount. The alternative after the hash is the alternative whose change in predicted probability is being evaluated.

The results in the highlighted rows at the bottom evaluate the changes in the probability for an increase in the price of using a charter boat.

The predicted probability of using a charter boat decreases by .0053, or a little more than half a percentage point. Most of that decrease is offset by an increase in the probability of using a private boat instead, which increases the probability by an average of .0038. The probabilities of using the beach or the pier increase by smaller amounts.