Key points re: comparing logit and probit

Here I will describe some major points regarding the relationship between logit and probit.

Big picture: If you are looking to stay out of the weeds and want a bottom line recommendation, mine is to be familiar with probit but have your default be logit.

Key points:

  1. When there are logit and probit “versions” of a model, the logit coefficients will always be larger in magnitude than the probit coefficients, by a factor of 1.5 to 2.

    • The magnitude difference in coefficients is a matter of “scaling” between the two models, without any substantive implications.
    • Accordingly, you never want to draw a substantive conclusion by comparing logit and probit coefficients directly.
    • Despite the logit and probit coefficients being different in magnitude, logit and probit predictions will be highly correlated (often r > .99)
  2. When there are logit and probit “versions” of a model, you will probably not have much of a theoretical reason to prefer one to the other.

    • When you have an interaction term, I would confirm that you get the same conclusion using logit and probit if you are going to draw a substantive conclusion about the interaction.
      • Other than this, I don’t think it is more broadly useful to check whether you get the same results from one model that you do with the other.
    • If you do fit both and get some sort of difference that makes one model easier to publish than the other, like a result that is statistically significant with one model but not the other, it would be dishonest to report the results from the model with the “better” result on that basis.
  3. Logit and probit models are sufficiently close to one another that one doesn’t need to worry about which one fits the data better.

    • Empirically, the question of whether the logit or probit version of the same model fit the data better is easy to evaluate: which model has the better likelihood?
      • But: the difference usually isn’t big enough to have confidence that one model fits the other better in the population.
    • With moderate sized samples and models that do not fit the data especially well – e.g., the usual case when analyzing survey data with several thousand or fewer respondents – one is unlikely to have a clear empirical rationale for picking one model over the other.
    • With very large samples – as in Big Data of one sort of another – it is possible that one can test whether a logit or probit fits better with a clear conclusion.

Advantages of using the logit model

  1. Exponentiated coefficients can be interpreted as an odds ratio.
  2. The odds ratio interpretation allows an (in principle) justification for why parameters estimated in case-control studies may be generalized to the population.
    • Case-control studies are very common in medicine. In such a study, the data are a sample of cases (i.e., people with an uncommon disease) combined with other observations drawn from an unaffected population that are used as controls. Often the data are 50% cases (\(y=1\)) and 50% controls (\(y=0\)). But, in the actual population, the true percentage of people with the disease may be rare (1% or less), and it may even be that the investigators do not know what that percentage is. In case controls, interpretations based on predicted probabilities do not make much sense, since they do not correspond to the actual population percentages. But, under assumptions, the odds ratio interpretation can still be used.
  3. It is computationally easier, both for the computer and for teaching/understanding.
  4. For the aforementioned reasons, logit is more common, but then the fact that it is more common provides an advantage in its own right for interpretability.

Aside: A reasonable rule of thumb for social scientists is that the farther you get from economics, the less likely you are to see probit. But when I have done a search on this, logit models are still more common even in economics than probit models.

Advantages of using the probit model

  1. The link to the normal distribution makes an obvious connection to the linear regression model.
  2. The latent variables approach to categorical outcomes is more straightforward when presented in terms of probit, even though you can also use a logit.

In addition, for some models you will see a probit version implemented when there is not a corresponding logit version, and sometimes you will see a logit version implemented when there is not a correspondent probit version. For example, Stata has a heteroskedastic probit command (\(\mathtt{hetprobit}\)) but not a logit counterpart, whereas Stata has a nested logit command (\(\mathtt{nlogit}\)) but not nested probit. This can be for a statisical reason – that is, it is much easier to write the model one way vs. the other; or a computational reason – that is, it is easier to estimate a model written one way vs. the other; or a pragmatic reason – that is, somebody happened to implement it one way instead of the other.