Problem set: Ordered outcomes

Concept Comprehension

This first set of questions does not require you to do any data analysis yourself.

Here is an excerpt from Stata output after fitting an ordered logit model:


-------------+----------------------------------------------------------------
       /cut1 |    -1.4819    .016573                     -1.514382   -1.449417
       /cut2 |  -.1022728   .0128912                      -.127539   -.0770066
       /cut3 |    .629538   .0135174                      .6030444    .6560316
       /cut4 |   1.822751   .0186015                      1.786292    1.859209
       /cut5 |   2.593911   .0253077                      2.544308    2.643513
       /cut6 |   3.847654   .0450161                      3.759424    3.935884
------------------------------------------------------------------------------

[1]How many categories did the outcome variable used above have? How can you tell?

[1]What is the difference between a standardized coefficient in a regression model and a y-standardized coefficient?

In the General Social Survey, respondents are asked to rate their general happiness (recoded here to 1=not too happy, 2=pretty happy, 3=very happy). The following are estimates obtained (using \(\texttt{listcoef}\)) after fitting an ordered logit model in which this happiness variable is the outcome, and the independent variable is highest degree attained, with the omitted category being those with no high school diploma.


-----------------------------------------
                |     bStdY    Odds ratio
----------------+------------------------
         degree |
   high school  |     0.124       1.255   
junior college  |     0.162       1.344   
      bachelor  |     0.292       1.702   
      graduate  |     0.333       1.834   
-----------------------------------------

[1]The column \(\texttt{bStdY}\) are \(y\)-standardized coefficients. In a sentence, interpret the .333 \(y\)-standardized coefficient for the category of respondents who have earned a graduate degree.

[1]The column \(\texttt{Odds ratio}\) are odds ratios. In a sentence, interpret the odds ratio 1.255 for having a high school diploma (but no college diploma). (Note: various interpretations are possible; use one.)

[1]The General Social Survey asks whether people recycle. The response categories are “always,” “often,” “sometimes,” and “never.” As descriped by class, you can think about the ordered logit model as having a set of implied binary logits. What are the implied binary logits in this case?

[1]At least as Stata fits the ordered logit model, there is no estimate for the intercept (\(\texttt{\_cons}\)). What is estimated instead?

Below is output the 2004 Wisconsin Longitudinal Study, when respondents were about 65 years old. The variables classrank, hn1, and z_ses57 reflect class rank, test score, and socioeconomic status, all from high school and all standardized.

An ordered logit model was fit to the data, and then \(\texttt{margins}\) was applied to those model estimates, yielding:


. margins, dydx(male) at(z_classrank = -1 hn1 = -1 z_ses57 = -1) noatlegend

Conditional marginal effects                             Number of obs = 7,221
Model VCE: OIM

dy/dx wrt: 1.male

1._predict: Pr(health04==1), predict(pr outcome(1))
2._predict: Pr(health04==2), predict(pr outcome(2))
3._predict: Pr(health04==3), predict(pr outcome(3))
4._predict: Pr(health04==4), predict(pr outcome(4))
5._predict: Pr(health04==5), predict(pr outcome(5))

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.male       |  (base outcome)
-------------+----------------------------------------------------------------
1.male       |
    _predict |
          1  |  -.0037347   .0015652    -2.39   0.017    -.0068024   -.0006669
          2  |  -.0103209   .0042637    -2.42   0.015    -.0186775   -.0019643
          3  |  -.0132588   .0054404    -2.44   0.015    -.0239219   -.0025958
          4  |   .0131821   .0054423     2.42   0.015     .0025154    .0238488
          5  |   .0141323   .0057939     2.44   0.015     .0027765     .025488
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

[1]The highlighted margins result is for the 3rd outcome category, which is that self-rated health is “good”. In a sentence, interpret the result \(-.0132\).

[1]With the ordered logit model, your instructor suggested that graphs of cumulative probabilities were often more useful than graphs of the probabilities of the individual categories. What is a cumulative probability in this context?

[2]Say our outcome categories are “excellent,” “very good,” “good,” “fair,” and “poor.” If we fit an ordered regression model, the direction of marginal effect for some of these outcome categories might vary depending on the values of the independent variables. For which categories is this the case, and why?

[2]The General Social Survey asks a bunch of questions about government spending in different areas (e.g., space exploration) where the outcome categories are “too much,” “about right,” and “too little.” Say you estimated an ordered logit model where this was your outcome and one of your covariates was age. What would it mean to say the parallel regression assumption was violated with respect to age?

Practice

The remaining items are premised on you doing some data analysis yourself.

In this exercise, you will using data of your choosing to fit models for an ordered outcome and interpret estimates from them. You will need:

An outcome variable with at least 3 categories that you are willing to treat as ordered. (If you do not have other data you’d like to use, General Social Survey has many, many items that can be used as ordered outcomes.)
At least three explanatory variables (a categorical variable counts as 1 variable).
At least one of these explanatory variables should be a dichotomous variable, and at least one variable that you are treating as continuous (i.e., interval-level).
One variable will be considered your key explanatory variable, and others are covariates in estimating the relationship between that key explanatory variable and the outcome.

[3]Fit an ordinary least squares regression model of your outcome with your key explanatory variable and covariates, and then fit an ordered probit model for the same set of explanatory variables. Paste your output into your assignment. Then, in 1-3 sentences, characterize how the results of the OLS and ordered probit model seem to differ from one another in terms of the direction and relative magnitude of the coefficients (that is, their difference from one another) and their \(p\)-values.

[2]Using your ordered probit results, compute and interpret the \(y\)-standardized coefficient for your key explanatory variable [\(\texttt{listcoef}\)].

[1]Fit an ordered logit model of your outcome, using your explanatory variables. Paste your output into your assignment. Compute and interpret the odds ratio for your key explanatory variable.

[2]Using your ordered logit results, for a binary or categorical explanatory variable, compute and interpret the average discrete change in the predicted probabilities of the outcome categories.

[3]Using \(\texttt{margins}\), compute the predicted probabilities of at least one outcome for two substantively meaningful values (hypothetical cases) of your data. In 2-3 sentences, interpret your results contrasting these hypothetical cases.

[4]Create an attractive graph that shows how the cumulative probabilities change as the continuous explanatory variable changes. The graph should have appropriately labeled axes, a legend, a title, and whatever else is needed so that I could tape your graph to the door of my office and passing sociologists would be able to apprehend the gist of it without anyone having to explain it to them.

[2]Compute a brant test of whether the parallel regressions assumption holds for your model. If it is violated, describe how it is violated and how severe the violation looks to you substantively.