Sequential logit

With subjective assessment items, like self-assessment of health or of opinion on an issue, a set of ordered categories may be easy to conceptualize as a continuum.

With other outcomes that also may be considered ordered, the outcome instead is more of a sequence of transitions. The example will use concerns educational attainment, and this is good for thinking about the matter. College graduates were earlier high school graduates; people with post-baccalaureate degrees were earlier college graduates. The outcome is effectively a set of stages, where higher levels of the outcome are successive stages.

Example

The data are from the National Longitudinal Study of Youth in 1979. The outcome is educational attainment, coded into five categories:

  1. Less than a high school diploma
  2. High school diploma but not college
  3. Some college but not a bachelor’s level degree
  4. Bachelor’s-level degree but no further
  5. Post-bachelor’s level education

We will model this as a sequence of four transitions:

  1. Does one earn high school diploma?
  2. If one has a high school diploma, does one go to college?
  3. If one attended college, does one earn a bachelor’s degree?
  4. If one earned a bachelor’s degree, does one receive post-bachelor’s education?

Our explanatory variables in this example will be sex (binary), race (three categories only: White, Black, Hispanic), and mother’s education (same five categories as used for own education).

We can fit four separate logits for these four transitions. For the latter three transitions, we exclude the sample only to include those who have made the prior transition.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(haven)
library(modelsummary)

# Load data
dat <- read_dta("../dta/nlsy79_cda.dta") %>%
  drop_na(edu30, female, race, momysch) %>%
  filter(momysch >= 0)

# Create variables
dat <- dat %>%
  mutate(
    male = 1 - female,
    male = factor(male, levels = c(0, 1), labels = c("Woman", "Man"))
  ) %>%
  mutate(
    race = factor(race, levels = c(1, 2, 3),
                  labels = c("White", "Black", "Hispanic"))
  ) %>%
  mutate(
    # Education level categorization
    ed_level = case_when(
      edu30 >= 0 & edu30 <= 11 ~ 1,
      edu30 == 12 ~ 2,
      edu30 >= 13 & edu30 <= 15 ~ 3,
      edu30 == 16 ~ 4,
      edu30 >= 17 & edu30 <= 20 ~ 5
    ),
    ed_level = factor(ed_level,
                      levels = 1:5,
                      labels = c("No HS diploma", "HS diploma only",
                                 "Some college", "BA-level college",
                                 "Post-BA college"))
  ) %>%
  mutate(
    # Mother's education
    mom_ed = case_when(
      momysch >= 0 & momysch <= 11 ~ 1,
      momysch == 12 ~ 2,
      momysch >= 13 & momysch <= 15 ~ 3,
      momysch == 16 ~ 4,
      momysch >= 17 & momysch <= 20 ~ 5
    ),
    mom_ed = factor(mom_ed,
                    levels = 1:5,
                    labels = c("No HS diploma", "HS diploma only",
                               "Some college", "BA-level college",
                               "Post-BA college"))
  ) %>%
  mutate(
    # Sequential outcomes
    hsdip = ifelse(edu30 >= 12, 1, 0),
    somecol = ifelse(edu30 >= 13, 1, 0),
    coldeg = ifelse(edu30 >= 16, 1, 0),
    postgrad = ifelse(edu30 >= 17, 1, 0)
  )

# Transition 1: High school diploma
mod1 <- glm(hsdip ~ male + race + mom_ed, family = binomial(), data = dat)

# Transition 2: Some college (conditional on HS diploma)
mod2 <- glm(somecol ~ male + race + mom_ed, family = binomial(),
            data = dat %>% filter(hsdip == 1))

# Transition 3: College degree (conditional on some college)
mod3 <- glm(coldeg ~ male + race + mom_ed, family = binomial(),
            data = dat %>% filter(somecol == 1))

# Transition 4: Post-graduate (conditional on college degree)
mod4 <- glm(postgrad ~ male + race + mom_ed, family = binomial(),
            data = dat %>% filter(coldeg == 1))

# Display results
models <- list(
  "HS dip" = mod1,
  "Some col" = mod2,
  "BA" = mod3,
  "Post-BA" = mod4
)

modelsummary(models, gof_map = c("nobs"))
Profiled confidence intervals may take longer time to compute.
  Use `ci_method="wald"` for faster computation of CIs.
HS dip Some col BA Post-BA
(Intercept) 0.190 -0.702 -0.552 -0.916
(0.042) (0.056) (0.093) (0.161)
maleMan -0.317 -0.208 0.135 0.169
(0.040) (0.048) (0.068) (0.099)
raceBlack 0.594 0.038 -0.787 -0.255
(0.051) (0.058) (0.085) (0.142)
raceHispanic 0.416 0.260 -0.702 0.256
(0.059) (0.073) (0.109) (0.177)
mom_edHS diploma only 0.711 0.823 0.555 0.065
(0.045) (0.056) (0.092) (0.163)
mom_edSome college 0.976 1.859 0.932 0.367
(0.078) (0.091) (0.113) (0.182)
mom_edBA-level college 1.162 2.558 1.507 0.658
(0.102) (0.134) (0.133) (0.187)
mom_edPost-BA college 1.403 2.654 1.757 0.776
(0.178) (0.220) (0.197) (0.233)
Num.Obs. 11862 7858 3927 1836

From this, we can see that net of other variables men are less likely to finish HS and less likely to go to college, but if they go to college, they are more likely to finish. Also we can see that, net of mother’s education, there are not significant Black-White differences for high school completion or college attendance, but Black respondents who do start college are significantly less likely to finish than their White counterparts.

------------------------------------------------------------------------------------
                              (1)             (2)             (3)             (4)
                           HS dip        Some col              BA         Post-BA
------------------------------------------------------------------------------------
#1
Man                          -.29***        -.208***         .135*           .169
                          (-4.70)         (-4.31)          (1.99)          (1.71)

Black                       -.029            .038           -.787***        -.255
                          (-0.39)          (0.66)         (-9.30)         (-1.79)

Hispanic                    -.394***          .26***        -.702***         .256
                          (-4.99)          (3.59)         (-6.46)          (1.45)

HS diploma only              1.23***         .823***         .555***         .065
                          (16.78)         (14.75)          (6.01)          (0.40)

Some college                 2.06***         1.86***         .932***         .367*
                          (11.28)         (20.39)          (8.22)          (2.01)

BA-level college             2.48***         2.56***         1.51***         .658***
                           (8.72)         (19.07)         (11.37)          (3.52)

Post-BA college              4.07***         2.65***         1.76***         .776***
                           (4.06)         (12.08)          (8.91)          (3.34)

Constant                     1.37***        -.702***        -.552***        -.916***
                          (21.38)        (-12.62)         (-5.96)         (-5.70)
------------------------------------------------------------------------------------
N                            9192            7858            3927            1836
------------------------------------------------------------------------------------

From this, we can see, for example, that net of other variables men are less likely to finish HS and less likely to go to college, but if they go to college, they are more likely to finish (note: in these data, I do not know if that is a more general phenomenon with college completion).

Also we can see that, net of mother’s education, there are not significant Black-White differences for high school completion or college attendance, but Black respondents who do start college are significantly less likely to finish than their White counterparts.

If we wanted to calculate the predicted probabilities, we would combine the probabilities from the individual logits, like so:

\[ \begin{align} \Pr(\texttt{ed_level}=1) & = \Pr(\texttt{hsdip} = 0) \\ \\ \Pr(\texttt{ed_level}=2) & = \Pr(\texttt{hsdip} = 1)\times\Pr(\texttt{coldeg} = 1) \\ \\ \Pr(\texttt{ed_level}=3) & = \Pr(\texttt{hsdip} = 1)\times\Pr(\texttt{somecol} = 1)\times\Pr(\texttt{coldeg} = 0) \\ \\ \Pr(\texttt{ed_level}=4) & = \Pr(\texttt{hsdip} = 1)\times\Pr(\texttt{somecol} = 1)\times\Pr(\texttt{coldeg} = 1)\times\Pr(\texttt{postgrad}=0) \\ \\ \Pr(\texttt{ed_level}=5) & = \Pr(\texttt{hsdip} = 1)\times\Pr(\texttt{somecol} = 1)\times\Pr(\texttt{coldeg} = 1)\times\Pr(\texttt{postgrad}=1) \\ \end{align} \]

The add-on package \(\mathtt{seqlogit}\) allows one to estimate the separate logits of the sequential logit model as a single model. The coefficients do not change, but it easier to do some tests with everything estimated in one model.

. seqlogit ed_level i.male i.race i.mom_ed, tree(1: 2 3 4 5, 2: 3 4 5, 3: 4 5, 4:5)

Transition tree:

Transition 1: 1 : 2 3 4 5
Transition 2: 2 : 3 4 5
Transition 3: 3 : 4 5
Transition 4: 4 : 5

Computing starting values for:

Transition 1
Transition 2
Transition 3
Transition 4

Iteration 0:   log likelihood = -12073.169
Iteration 1:   log likelihood = -12073.169

                                                       Number of obs =   9,192
                                                       LR chi2(28)   = 2184.43
Log likelihood = -12073.169                            Prob > chi2   =  0.0000

-----------------------------------------------------------------------------------
         ed_level | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
_2_3_4_5v1        |
             male |
             Man  |  -.2900644   .0617716    -4.70   0.000    -.4111346   -.1689942
                  |
             race |
           Black  |  -.0291676     .07529    -0.39   0.698    -.1767334    .1183981
        Hispanic  |  -.3944315   .0790509    -4.99   0.000    -.5493685   -.2394946
                  |
           mom_ed |
 HS diploma only  |   1.234234    .073545    16.78   0.000     1.090089     1.37838
    Some college  |   2.062536   .1827978    11.28   0.000     1.704259    2.420814
BA-level college  |   2.483555   .2847489     8.72   0.000     1.925457    3.041653
 Post-BA college  |   4.071563   1.003743     4.06   0.000     2.104263    6.038863
                  |
            _cons |   1.368916   .0640302    21.38   0.000     1.243419    1.494413
------------------+----------------------------------------------------------------
_3_4_5v2          |
             male |
             Man  |  -.2076764   .0482383    -4.31   0.000    -.3022218    -.113131
                  |
             race |
           Black  |   .0383556   .0581714     0.66   0.510    -.0756582    .1523694
        Hispanic  |   .2604843   .0726127     3.59   0.000      .118166    .4028026
                  |
           mom_ed |
 HS diploma only  |   .8232155    .055807    14.75   0.000     .7138358    .9325951
    Some college  |    1.85898   .0911781    20.39   0.000     1.680275    2.037686
BA-level college  |   2.557903   .1341388    19.07   0.000     2.294996    2.820811
 Post-BA college  |   2.653818   .2197279    12.08   0.000      2.22316    3.084477
                  |
            _cons |  -.7024935   .0556453   -12.62   0.000    -.8115564   -.5934307
------------------+----------------------------------------------------------------
_4_5v3            |
             male |
             Man  |   .1350765   .0678545     1.99   0.047     .0020841    .2680689
                  |
             race |
           Black  |  -.7868327   .0845712    -9.30   0.000    -.9525892   -.6210761
        Hispanic  |  -.7019694   .1087466    -6.46   0.000    -.9151088     -.48883
                  |
           mom_ed |
 HS diploma only  |   .5545198    .092234     6.01   0.000     .3737444    .7352952
    Some college  |   .9320838   .1133756     8.22   0.000     .7098717    1.154296
BA-level college  |    1.50749   .1326021    11.37   0.000     1.247595    1.767386
 Post-BA college  |   1.757065   .1972926     8.91   0.000     1.370378    2.143751
                  |
            _cons |  -.5515491   .0925064    -5.96   0.000    -.7328584   -.3702399
------------------+----------------------------------------------------------------
_5v4              |
             male |
             Man  |    .168599   .0986189     1.71   0.087    -.0246905    .3618885
                  |
             race |
           Black  |   -.254861   .1422387    -1.79   0.073    -.5336437    .0239216
        Hispanic  |   .2563586   .1770876     1.45   0.148    -.0907267     .603444
                  |
           mom_ed |
 HS diploma only  |   .0652974   .1633457     0.40   0.689    -.2548543    .3854491
    Some college  |    .367334   .1824193     2.01   0.044     .0097986    .7248693
BA-level college  |   .6584584     .18701     3.52   0.000     .2919255    1.024991
 Post-BA college  |   .7762903   .2325468     3.34   0.001      .320507    1.232074
                  |
            _cons |  -.9155704   .1607624    -5.70   0.000    -1.230659   -.6004819
-----------------------------------------------------------------------------------

The coefficients are the same as what we estimated earlier.

The log-likelihood of the combined model is just the sum of the log-likelihood of the four logits estimated separately.