Problem Set: Least Squares

Concept Comprehension

This first set of questions does not require you to do any data analysis yourself.

  1. If \(\hat{y} = 7\) and \(y_i = 4\), what is the squared residual (aka squared error)? [1]

*2. Even though software packages obtain OLS estimates via its analytic solution, the same estimates can also be obtained iteratively. What does it mean to obtain estimates iteratively? [1]

*3. Say I gave you a list of how many minutes it took me to bike to campus for each of the past few days, and I asked you to calculate the mean. You would add the times together and then divide that total by the number of days. But now instead: describe how one could calculate the mean iteratively (in words is fine; does not require explicit math or walking through an explicit example). [1]

*4. When we fit a linear regression model using OLS, the resulting \(\mathbf{x}\mathbf{\beta}\) for a given observation may be described as a conditional mean. The conditional mean of what, and conditional on what? [2]

Below is output from a regression of tvhours (hours of television watched per day) on no explanatory variables, along with a summary of the variable:

Expand for code that produces results below
library(tidyverse)
library(haven)
library(tulaverse)

df <- read_dta("../dta/gss_tvhours_only.dta") %>%
  filter(!is.na(tvhours))

tula(tvhours, data=df)

model <- lm(tvhours ~ 1, data = df)

tula(model)
──────────────────────────────────────────────────────────────────
Variable │       Obs       Mean    Std. dev.        Min        Max
──────────────────────────────────────────────────────────────────
tvhours  │     46149      3.036        Z.ZZZ          0         24
──────────────────────────────────────────────────────────────────


AIC = 217785.998                                        Number of obs = 46149
BIC = 217803.477                                        R-squared     =     0
                                                        Adj R-squared =     0
                                                        Root MSE      = 2.562
─────────────────────────────────────────────────────────────────────────────
tvhours     │      Coef  Std. Err.          t     P>|t|  [95% Conf  Interval]
─────────────────────────────────────────────────────────────────────────────
(Intercept) │     X.XXX     .01192      254.6    <.0001      3.012      3.059
─────────────────────────────────────────────────────────────────────────────

*5. In the output above, I have replaced the regression coefficient with X.XXX. What is it? Explain how you know. [1]

  1. In the output above, I have replaced the standard deviation with Z.ZZZ. What is it? Explain how you know. (N.B. The lecture notes re: this are not the same page as the notes re: the previous question.) [1]

*7. What happens to the RMSE when you add explanatory variables to a model? [1]

  1. How is the variance of a variable related to its standard deviation? [1]

*9. Say I gave you a list of how many minutes it took me to bike to campus for each of the past few days, and I asked you to calculate the median. You would sort the times in order from smallest to largest and then take the middle value. But now instead: describe how someone could find the median iteratively (in words is fine; does not require explicit math or walking through an explicit example). [2]