Expand for code that produces results below
library(tidyverse)
library(haven)
library(tulaverse)
df <- read_dta("../dta/gss_tvhours_only.dta") %>%
filter(!is.na(tvhours))
tula(tvhours, data=df)
model <- lm(tvhours ~ 1, data = df)
tula(model)This first set of questions does not require you to do any data analysis yourself.
*2. Even though software packages obtain OLS estimates via its analytic solution, the same estimates can also be obtained iteratively. What does it mean to obtain estimates iteratively? [1]
*3. Say I gave you a list of how many minutes it took me to bike to campus for each of the past few days, and I asked you to calculate the mean. You would add the times together and then divide that total by the number of days. But now instead: describe how one could calculate the mean iteratively (in words is fine; does not require explicit math or walking through an explicit example). [1]
*4. When we fit a linear regression model using OLS, the resulting \(\mathbf{x}\mathbf{\beta}\) for a given observation may be described as a conditional mean. The conditional mean of what, and conditional on what? [2]
Below is output from a regression of tvhours (hours of television watched per day) on no explanatory variables, along with a summary of the variable:
library(tidyverse)
library(haven)
library(tulaverse)
df <- read_dta("../dta/gss_tvhours_only.dta") %>%
filter(!is.na(tvhours))
tula(tvhours, data=df)
model <- lm(tvhours ~ 1, data = df)
tula(model)──────────────────────────────────────────────────────────────────
Variable │ Obs Mean Std. dev. Min Max
──────────────────────────────────────────────────────────────────
tvhours │ 46149 3.036 Z.ZZZ 0 24
──────────────────────────────────────────────────────────────────
AIC = 217785.998 Number of obs = 46149
BIC = 217803.477 R-squared = 0
Adj R-squared = 0
Root MSE = 2.562
─────────────────────────────────────────────────────────────────────────────
tvhours │ Coef Std. Err. t P>|t| [95% Conf Interval]
─────────────────────────────────────────────────────────────────────────────
(Intercept) │ X.XXX .01192 254.6 <.0001 3.012 3.059
─────────────────────────────────────────────────────────────────────────────
*5. In the output above, I have replaced the regression coefficient with X.XXX. What is it? Explain how you know. [1]
Z.ZZZ. What is it? Explain how you know. (N.B. The lecture notes re: this are not the same page as the notes re: the previous question.) [1]*7. What happens to the RMSE when you add explanatory variables to a model? [1]
*9. Say I gave you a list of how many minutes it took me to bike to campus for each of the past few days, and I asked you to calculate the median. You would sort the times in order from smallest to largest and then take the middle value. But now instead: describe how someone could find the median iteratively (in words is fine; does not require explicit math or walking through an explicit example). [2]