Pseudo-R² measures of model fit

\(R^2\) is the most commonly used measure of overall model fit in OLS regression, providing a single-number summary that might be used to talk about:

  1. whether the overall fit is better or worse than one might expect from the literature.
  2. whether it fits better in one sample versus another.
  3. whether it fits better for one group versus another.
  4. how much fit improves when new variable(s) are added.

For models that are fit using maximum likelihood and not OLS, \(R^2\) no longer applies. Yet the appetite for a 0-to-1 summary of overall model fit has produced many “pseudo-\(R^2\)” measures for maximum-likelihood models.

The biggest point about pseudo-\(R^2\) measures is that there are various different kinds and in practice their values are NOT similar to one another.

For this reason, simply having a row at the bottom of your table labeled “Pseudo-\(R^2\)” is not good practice, but instead you want to say which pseudo-\(R^2\) you are referring to.

McFadden’s pseudo-\(R^2\)

Stata will provide McFadden’s pseudo-\(R^2\) as part of the main output for models fit using maximum likelihood. I assume it is because it is the simplest pseudo-\(R^2\) to calculate, which is not necessarily the same as endorsing it as the best.

The formula is: \[ \textrm{McFadden pseudo-}R^2 = 1 - \frac{\textrm{log-likelihood of model}}{\textrm{log-likelihood of intercept-only model}} \]

Among different pseudo-\(R^2\) measures, McFadden’s tends to be on the low side—often lower than what one would observe as the \(R^2\) of a similarly motivated OLS model (or, for example, a linear probability model with the same predictors), especially when fit is not great. The Nagelkerke (pseudo-)\(R^2\), which one sees often in SPSS and R, is typically (much) higher. One cannot simply apply intuitions about \(R^2\) when interpreting pseudo-\(R^2\), and with McFadden’s in particular one will sometimes see values that would seem like very low \(R^2\).

Supplemental: Formula for the Nagelkerke pseudo-\(R^2\). The Nagelkerke pseudo-\(R^2\) is related to the simpler Cox & Snell pseudo-\(R^2\). Both may be written using the deviance of a model, which is \(-2\) times its log-likelihood.

\[ \begin{align} \textrm{Cox \& Snell pseudo-}R^2 & = 1 - \exp({\frac{\textrm{deviance of model} - \textrm{deviance of intercept-only model}}{N}}) \\ \\ \textrm{Nagelkerke pseudo-}R^2 & = \frac{1 - \exp({\frac{\textrm{deviance of model} - \textrm{deviance of intercept-only model}}{N}})}{1 - \exp({\frac{-\textrm{deviance of intercept-only model}}{N}})} \\ \end{align} \]

Aside: There are critiques of \(R^2\) that usually boil down to the idea that its usefulness for the above purposes may be often exaggerated. We are setting that matter aside here.

Examples

R: Alternative measures of model fit. The DescTools package, which works at least with models fit using glm, polr, or multinom, includes various pseudo-\(R^2\) measures obtained via PseudoR2(modelname, which = "all").

Stata: Alternative measures of model fit. Various other pseudo-\(R^2\) measures are available in Stata using the add-on command \(\texttt{fitstat}\), co-authored once upon a time by yours truly.

Below are goodness-of-fit statistics for models fit on other pages for this course, allowing you to see how pseudo-\(R^2\) values vary among three measures: McFadden’s, Nagelkerke, and McKelvey/Zavoina.

Example of religious nones in General Social Survey

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(haven)
library(tulaverse)
library(DescTools)

df <- read_dta("../dta/gss_norelig_thru2018.dta")

model <- glm(norelig ~ male + cohort, data = df, family = "binomial")

tula(model)
Family: binomial / Link: logit
AIC            =  43919.582                         Number of obs   =   64319
BIC            =  43946.797                         McFadden R-sq   = 0.07391
Log likelihood = -21956.791                         Nagelkerke R-sq =  0.1017
─────────────────────────────────────────────────────────────────────────────
norelig     │      Coef  Std. Err.          z     P>|z|  [95% Conf  Interval]
─────────────────────────────────────────────────────────────────────────────
male        │     .5712     .02503      22.82    <.0001      .5221      .6203
cohort      │     .0341   .0006714       50.8    <.0001     .03279     .03542
(Intercept) │    -68.92      1.316     -52.38    <.0001     -71.49     -66.34
─────────────────────────────────────────────────────────────────────────────
options(scipen = 9) # needed to keep it from printing everything with scientific notation
PseudoR2(model, which = "all")
       McFadden     McFaddenAdj        CoxSnell      Nagelkerke   AldrichNelson 
     0.07390692      0.07378038      0.05302884      0.10167270      0.05167125 
VeallZimmermann           Efron McKelveyZavoina            Tjur             AIC 
     0.12175930      0.05594884      0.15816660      0.05654573  43919.58234532 
            BIC          logLik         logLik0              G2 
 43946.79717639 -21956.79117266 -23709.05431651   3504.52628771 
. use ../dta/gss_norelig_thru2018, clear
(Extract from GSS cumulative file through 2018)

. quietly logit norelig male cohort, nolog

. fitstat

                         |       logit
-------------------------+-------------
Log-likelihood           |
                   Model |  -21956.791
          Intercept-only |  -23709.054
-------------------------+-------------
Chi-square               |
      Deviance(df=64316) |   43913.582
                LR(df=2) |    3504.526
                 p-value |       0.000
-------------------------+-------------
R2                       |
                McFadden |       0.074
      McFadden(adjusted) |       0.074
      McKelvey & Zavoina |       0.158
            Cox-Snell/ML |       0.053
  Cragg-Uhler/Nagelkerke |       0.102
                   Efron |       0.056
                Tjur's D |       0.057
                   Count |       0.879
         Count(adjusted) |       0.000
-------------------------+-------------
IC                       |
                     AIC |   43919.582
        AIC divided by N |       0.683
               BIC(df=3) |   43946.797
-------------------------+-------------

Example of surviving the sinking of the Titanic

df <- read_dta("../dta/titanic_passengers.dta")

# fit model
df <- df %>%
  mutate(female = as_factor(female),
         child = as_factor(child),
         pclass = relevel(as_factor(pclass), ref = "3rd"))

model <- glm(survived ~ female + child + pclass, data = df, family = "binomial")

tula(model)
Family: binomial / Link: logit
AIC            = 1244.587                            Number of obs   =   1309
BIC            = 1270.472                            McFadden R-sq   = 0.2909
Log likelihood = -617.293                            Nagelkerke R-sq = 0.4362
─────────────────────────────────────────────────────────────────────────────
survived    │      Coef  Std. Err.          z     P>|z|  [95% Conf  Interval]
─────────────────────────────────────────────────────────────────────────────
female      │     2.499      .1484      16.84    <.0001      2.208       2.79
child       │     1.085      .2292      4.735    <.0001       .636      1.535
pclass      │                                                                
  1st       │     1.862      .1761      10.57    <.0001      1.517      2.207
  2nd       │     .8977      .1802      4.981    <.0001      .5445      1.251
(Intercept) │    -2.295      .1353     -16.96    <.0001      -2.56     -2.029
─────────────────────────────────────────────────────────────────────────────
options(scipen = 9) # needed to keep it from printing everything with scientific notation
PseudoR2(model, which = "all")
       McFadden     McFaddenAdj        CoxSnell      Nagelkerke   AldrichNelson 
      0.2908848       0.2851410       0.3208334       0.4361913       0.2789617 
VeallZimmermann           Efron McKelveyZavoina            Tjur             AIC 
      0.4887008       0.3633538       0.4195630       0.3592400    1244.5868636 
            BIC          logLik         logLik0              G2 
   1270.4719574    -617.2934318    -870.5121915     506.4375193 
. use ../dta/titanic_passengers, clear
(Titanic survival data for passengers from OpenML)

. quietly logit survived i.female i.child ib3.pclass

. fitstat

                         |       logit
-------------------------+-------------
Log-likelihood           |
                   Model |    -617.293
          Intercept-only |    -870.512
-------------------------+-------------
Chi-square               |
       Deviance(df=1304) |    1234.587
                LR(df=4) |     506.438
                 p-value |       0.000
-------------------------+-------------
R2                       |
                McFadden |       0.291
      McFadden(adjusted) |       0.285
      McKelvey & Zavoina |       0.420
            Cox-Snell/ML |       0.321
  Cragg-Uhler/Nagelkerke |       0.436
                   Efron |       0.363
                Tjur's D |       0.359
                   Count |       0.784
         Count(adjusted) |       0.434
-------------------------+-------------
IC                       |
                     AIC |    1244.587
        AIC divided by N |       0.951
               BIC(df=5) |    1270.472
-------------------------+-------------
Variance of              |
                       e |       3.290
                  y-star |       5.670

Example of class identification predicted by education and income

df <- read_dta("../dta/gss_soc383.dta")

df <- df %>%
  mutate(middleupper = as.integer(class %in% c(3, 4))) %>%
  filter(!is.na(middleupper), !is.na(educ), !is.na(realinc)) # listwise deletion for missing data

model <- glm(middleupper ~ educ + realinc, data = df, family = "binomial")

tula(model)
Family: binomial / Link: logit
AIC            =  72300.693                         Number of obs   =   58213
BIC            =  72327.609                         McFadden R-sq   = 0.09798
Log likelihood = -36147.347                         Nagelkerke R-sq =  0.1688
─────────────────────────────────────────────────────────────────────────────
middleupper │      Coef  Std. Err.          z     P>|z|  [95% Conf  Interval]
─────────────────────────────────────────────────────────────────────────────
educ        │     .1316    .003243      40.57    <.0001      .1252      .1379
realinc     │ 2.043e-05  3.905e-07      52.33    <.0001  1.967e-05   2.12e-05
(Intercept) │    -2.539     .04169     -60.91    <.0001     -2.621     -2.457
─────────────────────────────────────────────────────────────────────────────
options(scipen = 9) # needed to keep it from printing everything with scientific notation
PseudoR2(model, which = "all")
       McFadden     McFaddenAdj        CoxSnell      Nagelkerke   AldrichNelson 
     0.09798281      0.09790795      0.12619931      0.16880237      0.11886740 
VeallZimmermann           Efron McKelveyZavoina            Tjur             AIC 
     0.20520325      0.13184187      0.17956686      0.12907715  72300.69345503 
            BIC          logLik         logLik0              G2 
 72327.60904696 -36147.34672752 -40073.89998323   7853.10651143 
. use ../dta/gss_soc383.dta, clear
(GSS 1972-2018: selected variables)

:: several recode and filtering operations omitted ::

. quietly logit middleupper educ inc2010

. fitstat

                         |       logit
-------------------------+-------------
Log-likelihood           |
                   Model |  -13277.236
          Intercept-only |  -15377.992
-------------------------+-------------
Chi-square               |
      Deviance(df=22264) |   26554.471
                LR(df=2) |    4201.512
                 p-value |       0.000
-------------------------+-------------
R2                       |
                McFadden |       0.137
      McFadden(adjusted) |       0.136
      McKelvey & Zavoina |       0.263
            Cox-Snell/ML |       0.172
  Cragg-Uhler/Nagelkerke |       0.230
                   Efron |       0.179
                Tjur's D |       0.175
                   Count |       0.696
         Count(adjusted) |       0.346
-------------------------+-------------
IC                       |
                     AIC |   26560.471
        AIC divided by N |       1.193
               BIC(df=3) |   26584.504
-------------------------+-------------
Variance of              |
                       e |       3.290
                  y-star |       4.466