Predicted Change for a Continuous Explanatory Variable

In the logit model, the change in the predicted probability associated with a change in an explanatory variable depends on that variable’s baseline value.

In the case of the earlier example of fourth-down decisions in football, we can see from the profile plot (reproduced below) that the difference that each additional yard needed makes in the probability of going for it shrinks as the distance increases. That is, the difference in predicted probabilities for a one-yard increase from 4th-and-1 to 4th-and-2 is larger than the difference for a one-yard increase from 4th-and-9 to 4th-and-10.

Expand to show code that opens data, recodes variables, and fits logit model

options(scipen = 9)

# dependencies
library(tidyverse)
library(haven)
library(tulaverse)
library(marginaleffects)
library(gridExtra) # for combining plots below

df <- read_dta("../dta/cfb_fourthdown.dta")

df <- df %>%
  mutate(score_is = as_factor(score_is)) %>%
  mutate(score_is = relevel(score_is, ref="tied")) %>%
  mutate(gametime = as_factor(gametime)) %>%
  mutate(gametime = fct_recode(gametime,
                      "Q1" = "1st Q",
                      "Q2" = "2nd Q",
                      "Q3" = "3rd Q",
                      "Q4" = "4th Q",
                      "OT" = "Overtime")) 

model <- glm(goforit ~ yds_needed + yds_to_td + score_is + gametime, 
             data=df, 
             family="binomial")

tula(model)

Expand to show code that draws plot

toplot <- avg_predictions(model,
                variables = list("yds_needed" = seq(1, 20, 1)))

ggplot(toplot, aes(x = yds_needed, y = estimate)) +
  geom_line(color = "red3") +
  geom_point(color = "red3", size = 2, shape = 21, fill = "red3") + 
  scale_y_continuous(limits = c(0, 0.4)) +
  scale_x_continuous(limits = c(1, 20), 
                     breaks = seq(1, 20, 1),
                     minor_breaks = NULL) +
  labs(
    x = "Yards Needed for First Down",
    y = "Predicted Probability"
  ) +
  theme_tula()

Moreover, the change in the predicted probability depends not just on the baseline value of our explanatory variable, but also on the values of the other explanatory variables. The profile below shows how the relationship between yards-to-go and the probability of going for it on fourth down also depends on whether the team is ahead or behind.

Expand to show code that draws plot

toplot <- avg_predictions(model,
                variables = c(list("yds_needed" = seq(1, 20, 1)), 
                              list("score_is" = c("ahead", "behind")))
                )

ggplot(toplot, aes(x = yds_needed, y = estimate, color = score_is, fill = score_is, linetype = score_is)) +
  geom_line() +
  geom_point(size = 2, shape = 21, stroke = 1) + 
  scale_color_manual(values = c("red3", "blue3"), name = NULL) +
  scale_fill_manual(values = c("red3", "blue3"), name = NULL) +
  scale_linetype_manual(values = c("solid", "dashed"), name = NULL) +  
  scale_y_continuous(limits = c(0, 0.6)) +
  scale_x_continuous(limits = c(1, 20), 
                     breaks = seq(1, 20, 1),
                     minor_breaks = NULL) +
  labs(
    x = "Yards Needed for First Down",
    y = "Predicted Probability"
  ) +
  theme_tula() +
  theme(
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1),
    legend.background = element_rect(fill = "white", 
                                   color = "gray80",
                                   linewidth = 0.5, 
                                   linetype = "solid"),
    legend.margin = margin(6, 6, 6, 6),
    panel.grid.minor = element_blank()
  )

If we look now at the difference in predicted probabilities between 4th-and-1 and 4th-and-2, we can see that it is bigger when the team is behind than it is when the team is ahead.

Marginal change

For continuous explanatory variables, changes in predicted probability are often presented in terms of marginal changes (or “marginal effects”). If you have had some calculus, the marginal change is the derivative evaluated at specific values of explanatory variable(s) \(\mathbf{x}\).

If you haven’t had calculus, look at the first profile plot above. The slope of the curve representing the predicted probabilities decreases as distance increases. At any point along that curve, we can evaluate the slope of the curve at that specific point. (Imagine magnifying the curve at any specific spot–if you magnify enough, any curve examined at a single, specific point will look like a straight line, and we are asking what the slope of that straight line is.)

You will be reasonably close if you think of the marginal effect as the change in \(y\) for a unit change in \(x\) evaluated at some particular point.

Stata: Average change for one-unit change in explanatory variable. I have co-authored a command for Stata that computes the average change for a one-unit change instead of the marginal change. The command is called \(\mathtt{mchange}\) and is available as part of the \(\mathtt{spost13\_ado}\) package available online using the \(\mathtt{findit}\) command in Stata.

Again, I think for practical purposes the marginal change approximates this closely enough except when the scale of the explanatory variable is very small (i.e., when your explanatory variable is a proportion or something else that only varies from 0 to 1).

Marginal change in logit

The marginal change for an explanatory variable is calculated given specific values of all the explanatory variables.

The formula for the marginal change in the logit model is simple. For explanatory variable \(x\), the change in \(Pr(y=1)\) for a marginal increase in \(x\) is a function of two things:

the logit coefficient of \(x\) (\(\beta_x\))
the predicted probability of y=1 given specific value(s) of the explanatory variable(s) (that is, \(\Pr(y=1|\mathbf{x})\)).

Specifically, the formula is: \[ \mathrm{Marginal\ change\ for\ }x = \beta_x\left[\Pr(y=1|\mathbf{x})\right]\left[1-\Pr(y=1|\mathbf{x})\right] \]

We can see from this formula why the marginal change depends on the values of all the explanatory variables: it depends on \(\Pr(y=1|\mathbf{x})\), and \(\Pr(y=1|\mathbf{x})\) depends on the values of all the explanatory variables.

Also, notice that, for a given value of \(\mathbf{x}\) – that is, a given value of \(\Pr(y=1|\mathbf{x})\) – the marginal changes for all the explanatory variables will differ from one another exactly in proportion to their coefficients.

Using this formula, we can plot, over the whole range of possible predicted probabilities, the marginal change when \(\beta=1\).

Expand to show code that generates plot

# Create sample data
prob <- seq(0, 1, length.out = 1000)
marginal_change <- prob * (1 - prob)

# Create data frame
toplot <- data.frame(probability = prob, marginal_change = marginal_change)

ggplot(toplot, aes(x = probability, y = marginal_change)) +
  geom_line(color = "darkred", linewidth = 1) +
  scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2)) +
  scale_y_continuous(limits = c(0, 0.25), breaks = seq(0, 0.25, 0.05)) +
  labs(x = "Estimated Probability(y=1)",
       y = "Marginal Change") +
  theme_tula() +
  theme(
    panel.grid.minor = element_blank(),
    axis.title.y = element_text(margin = margin(r = 10)),
    axis.title.x = element_text(margin = margin(t = 10))
  )

We can see here that the marginal change is biggest when \(\Pr(y=1)\) is .5, and it is smallest as \(\Pr(y=1)\) approaches either 0 or 1.

Moreover, when \(\Pr(y=1)\) is .5, not only is the marginal effect the biggest, but it is equal to \(\beta_x\) times .25. This means that if you divide a logit coefficient by 4, you get the maximum (in magnitude) marginal effect. This is sometimes called the “divide by 4 rule”.

Two other observations from the curve above:

The marginal change when \(\Pr(y=1) = p\) has the same magnitude as when \(\Pr(y=1) = 1-p\).
The curve above gets steeper as the predicted probability approaches 0 or 1. As a result, the marginal effects when \(\Pr(y=1)\) is between .4 and .6 vary much less than, say, when the predicted probability varies between 0 and .2.

Average marginal change

The marginal change for an explanatory variable could be different for every observation in our dataset. For our college football example of going for it on fourth down, we can calculate the marginal change for the \(\mathtt{yds\_needed}\) variable for every observation and make a histogram to show this variation. We can look at it alongside a histogram of the predicted probabilities in these data.

Expand to show code that draws histograms

compute_me <- predictions(model, type = "response") %>%
  rename(pr_1 = estimate) %>%
  mutate(pr_0 = 1 - pr_1) %>%
  mutate(marg_eff = coef(model)["yds_needed"] * pr_1 * pr_0)

p1 <- ggplot(compute_me, aes(x = marg_eff)) +
  geom_histogram(fill = "deepskyblue", color = "black", bins = 50) +
  labs(
    x = "Marginal change for yds_needed",
    y = "Frequency"
  ) +
  theme_tula()

p2 <- ggplot(compute_me, aes(x = pr_1)) +
  geom_histogram(fill = "gold", color = "black", bins = 50) +
  labs(
    x = "Predicted probability of going for it",
    y = "Frequency"
  ) +
  theme_tula()

grid.arrange(p1, p2, ncol = 2)

Things to notice about these graphs:

Looking at the left graph, you’ll notice that the marginal changes are all negative. This is because the coefficient of \(\mathtt{yds\_needed}\) is negative. In a binary outcome model, the marginal change for a variable is always the same sign as the coefficient of that variable.
The distribution of marginal changes in the left graph has two peaks (on each end), whereas the distribution of predicted probabilities in the right graph has only one.
The peak on the right side of the marginal change plot reflects the large number of cases with predicted probabilities near 0. Because the predicted probability is near zero, the marginal change is near zero as well.
The peak on the left side of the marginal change plot, on the other hand, reflects cases that have predicted probabilities around .5. Because a relatively wide interval of predicted probabilities around .5 has similar marginal changes, these observations add up and create a peak even though the number of cases at any specific predicted probability in that interval is relatively few.

The average marginal change (also called the average marginal effect and often abbreviated as AME) is the mean of the marginal changes for each observation.

Here we will calculate the average marginal change ourselves by computing the marginal change for each observation and then calculating the mean.

Expand to show code that fits model.

model <- glm(goforit ~ yds_needed + yds_to_td + score_is + gametime, 
             data=df, 
             family="binomial")

tula(model)

Family: binomial / Link: logit
AIC            =  58787.222                          Number of obs   =  80887
BIC            =  58870.929                          McFadden R-sq   = 0.2452
Log likelihood = -29384.611                          Nagelkerke R-sq = 0.3402
─────────────────────────────────────────────────────────────────────────────
goforit     │      Coef  Std. Err.          z     P>|z|  [95% Conf  Interval]
─────────────────────────────────────────────────────────────────────────────
yds_needed  │    -.2136    .002935     -72.78    <.0001     -.2194     -.2079
yds_to_td   │    -.0326   .0004698     -69.39    <.0001    -.03352    -.03168
score_is    │                                                                
  ahead     │    -.7973     .03756     -21.23    <.0001     -.8709     -.7236
  behind    │     .4748     .03484      13.63    <.0001      .4065      .5431
gametime    │                                                                
  Q2        │     .1733     .03314      5.228    <.0001      .1083      .2382
  Q3        │     .3235     .03477      9.305    <.0001      .2554      .3916
  Q4        │     1.412     .03214      43.94    <.0001      1.349      1.475
  OT        │     .6326      .1468       4.31    <.0001      .3449      .9203
(Intercept) │     .7901     .03986      19.82    <.0001      .7119      .8682
─────────────────────────────────────────────────────────────────────────────

compute_ame <- predictions(model, type="response") %>%
  rename(pr_1 = estimate) %>%
  mutate(pr_0 = 1-pr_1) %>%
  mutate(marg_eff = coef(model)["yds_needed"] * pr_1 * pr_0)

mean(compute_ame$marg_eff)

[1] -0.02443796

We can have R compute this using avg_comparisons() from the marginaleffects package.

avg_comparisons(
  model,
  variables = "yds_needed",  
  comparison = "dydx",          
  vcov = FALSE  # true = compute standard errors                   
)


 Estimate
  -0.0244

Term: yds_needed
Type: response
Comparison: dY/dX

. predict pr_1 if e(sample)
(option pr assumed; Pr(goforit))
(6,399 missing values generated)

. gen me = _b[yds_needed] * pr_1 * (1-pr_1)
(6,399 missing values generated)

. sum me // average marginal change

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          me |     80,887    -.024438    .0176084  -.0534086   -.000043

In practice, we would instead have Stata compute the average marginal change using \(\texttt{margins, dydx()}\), as follows:

. margins, dydx(yds_needed)

Average marginal effects                                Number of obs = 80,887
Model VCE: OIM

Expression: Pr(goforit), predict()
dy/dx wrt:  yds_needed

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  yds_needed |   -.024438   .0003041   -80.36   0.000     -.025034   -.0238419
------------------------------------------------------------------------------

We can interpret this result as:

Averaging over all observed game situations, the decrease in the probability of going for it on fourth down for a marginal increase in yards needed is -.024, net of field position, quarter, and which team is leading.

If we want to fudge to make it a bit more accessible:

Averaging over all observed game situations, an additional yard is associated with about a .024 decrease in the probability of going for it on fourth down, net of field position, quarter, and which team is leading.

Aside: Why do I say “marginal change” when one more often sees “marginal effect”? In social science, it is very common that people will use “effect” when describing coefficients while still being adamant that they are not actually saying anything about causality. I do this, too, but have been endeavoring to wean myself off of it.

Stata: What does the \(\texttt{dydx()}\) option for \(\texttt{margins}\) do? The \(\texttt{dydx()}\) option can be confusing because it effectively does two different things:

For a continuous variable – and Stata will assume the variable is continuous unless you specify \(\texttt{i.}varname\) when fitting the model – \(\texttt{margins, dydx(}varname\texttt{)}\) will provide the marginal change for \(varname\).
For a factor variable, Stata will instead provide the change in the outcome associated with a change in the factor variable from its reference category to the specified category (so, for a binary explanatory variable, the change in \(\Pr(y=1)\) as \(x\) changes from 0 to 1).

Again, whether a variable is treated as continuous or as a factor variable depends on whether you specified \(\texttt{i.}\) when fitting the model. You do not specify \(\texttt{i.}\) again when using \(\texttt{margins}\); it “remembers” based on what you specified when you fit the model.

The \(\texttt{dydx}\) is a calculus reference, referring to the idea of the derivative as the rate of change in \(y\) evaluated at a particular value for x.

Average marginal change as approximation of the average change for a unit increase

The marginal change will be very close to the change associated with a unit increase in \(x\) when the scale of a variable is large enough that a unit increase is a relatively small change in the variable.

The marginal change will be a worse approximation of the change associated with a unit increase when the scale is small, so that a unit increase is a big change in the explanatory variable. Worst of all is when the scale is so small that a unit change in an explanatory variable is bigger than its entire range (like, for example, if your explanatory variable is a proportion).

In R, you can obtain the average change for a unit increase instead of the average marginal change by adding eps = 1 as an argument to avg_comparisons.

avg_comparisons(
  model,
  variables = "yds_needed",  
  comparison = "dydx",
  eps = 1,
  vcov = FALSE  # true = compute standard errors                   
)


 Estimate
  -0.0244

Term: yds_needed
Type: response
Comparison: dY/dX

The average change is -.0244, or the same at this level of precision as the average marginal effect. So when I described the above interpretation in terms of the change in probability associated with an additional yard as being a “fudge,” it turns out that in this case it was not a fudge at all.

Average marginal change and the coefficient of the linear probability model

In the linear probability model, the \(\beta\) coefficient is the marginal change in the probability, and, unless we are using some kind of nonlinear specification of the explanatory variables, the marginal change for an explanatory variable is the same across all values of the explanatory variables.

Since the marginal change in the linear probability model is estimated to be the same across all observations, we can refer to the coefficient as the linear probability model’s estimate of the average marginal change.

model_lpm <- lm(goforit ~ yds_needed + yds_to_td + score_is + gametime, data = df)

tula(model_lpm, robust=TRUE)

AIC = 59406.469                                        Number of obs =  80887
BIC = 59499.477                                        R-squared     = 0.1963
                                                       Adj R-squared = 0.1962
                                                       Root MSE      = 0.3493
─────────────────────────────────────────────────────────────────────────────
goforit     │      Coef  Robust SE          t     P>|t|  [95% Conf  Interval]
─────────────────────────────────────────────────────────────────────────────
yds_needed  │   -.01711   .0002212     -77.32    <.0001    -.01754    -.01667
yds_to_td   │   -.00402  5.275e-05      -76.2    <.0001   -.004123   -.003916
score_is    │                                                                
  ahead     │    -.0918     .00382     -24.03    <.0001    -.09929    -.08431
  behind    │    .05444    .003666      14.85    <.0001     .04726     .06163
gametime    │                                                                
  Q2        │    .02324    .003277      7.092    <.0001     .01682     .02967
  Q3        │    .03897    .003472      11.22    <.0001     .03217     .04578
  Q4        │     .1854    .003852      48.13    <.0001      .1778      .1929
  OT        │    .09153     .02667      3.431     .0006     .03925      .1438
(Intercept) │     .4734    .004961      95.44    <.0001      .4637      .4831
─────────────────────────────────────────────────────────────────────────────

. regress goforit yds_needed yds_to_td ib2.score_is i.gametime, allbaselevels

      Source |       SS           df       MS      Number of obs   =    80,887
-------------+----------------------------------   F(8, 80878)     =   2468.81
       Model |  2409.93782         8  301.242227   Prob > F        =    0.0000
    Residual |  9868.68466    80,878  .122019395   R-squared       =    0.1963
-------------+----------------------------------   Adj R-squared   =    0.1962
       Total |  12278.6225    80,886  .151801578   Root MSE        =    .34931

------------------------------------------------------------------------------
     goforit | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  yds_needed |  -.0171053   .0002271   -75.30   0.000    -.0175505   -.0166601
   yds_to_td |  -.0040196   .0000535   -75.09   0.000    -.0041246   -.0039147
             |
    score_is |
      ahead  |  -.0918024   .0041274   -22.24   0.000     -.099892   -.0837128
       tied  |          0  (base)
     behind  |   .0544415   .0039691    13.72   0.000      .046662    .0622209
             |
    gametime |
      1st Q  |          0  (base)
      2nd Q  |   .0232422   .0036328     6.40   0.000     .0161219    .0303626
      3rd Q  |   .0389717   .0038448    10.14   0.000     .0314359    .0465075
      4th Q  |   .1853665   .0037781    49.06   0.000     .1779614    .1927717
   Overtime  |   .0915251   .0211848     4.32   0.000      .050003    .1330473
             |
       _cons |   .4734163   .0045816   103.33   0.000     .4644364    .4823962
------------------------------------------------------------------------------

The average marginal change as estimated by the linear probability model is \(-.017\), as opposed to \(-.024\) from the logit model. This is about a 30% difference, which I would not consider substantively trivial, especially given that the large number of cases means our standard errors are very small. But, many times, the estimates will be closer than this, especially for models where the predicted probabilities are all in the middle of the distribution and so the marginal effects from logit will not vary that much anyway.

Earlier in the class we noted that when predicted probabilities are all in the middle of the distribution (say, .3 to .7), a linear change in the probabilities looks a lot like the predictions of the logit model. In those cases, the coefficient from the linear probability model will also be close to the average marginal effect from the logit model, strengthening the argument that just using the linear probability model in that case will probably not lead to any substantive mischaracterization of results.

Marginal change differences by groups

Above we showed a graph in which the relationship between yds_needed and going for it was stronger when a team was behind versus ahead.

In R, we can compute average marginal change of a continuous variable by different groups by adding a by argument to avg_comparisons.

avg_comparisons(
  model,
  variables = "yds_needed",
  by = "score_is",
  comparison = "dydx", 
  vcov = FALSE          
)


 score_is Estimate
   tied    -0.0210
   ahead   -0.0208
   behind  -0.0284

Term: yds_needed
Type: response
Comparison: dY/dX

In these results, we can see the marginal effect for teams that are ahead is -.021, whereas for teams that are behind it is -.028.