For the model fit below, \(\texttt{birwtoz}\) is the birthweight of an infant (measured in ounces), \(\texttt{smoke}\) is whether or not the mother smokes (1=yes, 0=no), and \(\texttt{gestat}\) is the gestational period (the amount of time the mother was pregnant before giving birth, measured in weeks).
lm_fit <-lm(birwtoz ~ smoke + gestat, data = df)tula(lm_fit)
The estimate for \(\texttt{gestat}\) is 3.3. In one sentence, describe the relationship between gestational period and birthweight using this coefficient. [1]
The estimate for \(\texttt{smoke}\) is -11.2. In one sentence, describe the relationship between smoking and birthweight using this coefficient. [1]
Given the above estimates, what is the predicted birthweight of a baby born at 37 weeks whose mother smoked during her pregnancy? Explain or show how you calculated this. [2]
From the 2018 General Social Survey, here are estimates from a regression of the number of hours of television watched per day by marital status. The reference category is married.
model <-lm(tvhours ~ marstatus, data = gss18)tula(model)
The coefficient for \(\texttt{widowed}\) is 1.45. In one sentence, interpret this coefficient. [1]
Ignoring the question of whether the difference is large enough to be statistically significant, who watches more TV per day according to these results: divorced people, or people who have never married? By how much? [1]
Say that, instead of using married as the reference category, we had used never married instead. What would the coefficient for \(\texttt{widowed}\) have been? [1]
When a researcher conducts an experiment in which participants have been randomly assigned to a treatment or a control group, there is no need to include any control variables (aka covariates) in a regression model of the outcome in order to estimate an unbiased treatment effect. Why not? [1]
For questions 8–10, imagine a researcher was interested in estimating the effect of earnings on happiness using observational data (e.g., data from a survey).
What is a potential confounding variable that a researcher might want to adjust for in estimating this effect? (Present your answer in a way that makes clear you understand what makes something a confounding variable.) [1]
What is a potential mediating variable that, depending on one’s purpose, one might want to include as a covariate? (Present your answer in a way that makes clear you understand what a mediating variable is.) [1]
Including a potential confounding variable as a covariate in a regression analysis serves a different purpose than fitting a model in which one includes a potential mediating variable. What are these different purposes? [2]
For questions 11–12, here are estimates from a model using the NHANES height data (persons aged 3-18):
nhanes <-read_dta("../dta/nhanes_bodymeasures.dta") %>%filter(age >2& age <=18) %>%mutate(raceeth =factor(raceeth5cat, levels =c(1, 2, 3, 4, 5), labels =c("mex-am", "oth-hisp", "nonh-white", "nonh-black", "nonh-other")))model <-lm(height ~ male + age + raceeth, data = nhanes)tula(model)
From these data, what is the predicted height of a 14-year-old non-Hispanic White boy? Explain/show how you arrive at this prediction. [1]
In OLS, the linear predictor is often referred to as \(\hat{y}\). Why might it be preferable to refer to the linear predictor as \(\mathbf{x}\mathbf{\beta}\) rather than \(\hat{y}\), especially in a course about models beyond OLS? [2]
The following estimates are from a model of the logged number of hours a day the respondent watches television, with marital status as the explanatory variable.
Unlike the earlier example, the reference category is never married.
The folks who report not watching TV are dropped. So this population of adults is those who report at least watching some TV in a typical day.
In one sentence, interpret the exponentiated coefficient of \(\texttt{married}\). The interpretation should be in terms of hours, not logged hours. [1]
In one sentence, interpret the exponentiated coefficient of \(\texttt{widowed}\). The interpretation should be in terms of hours, not logged hours. [1]