Here we will review how to compute the predicted value of the outcome in OLS given a set of estimates for our coefficients.
Using the NHANES data from a sample of US children and adolescents (ages 3-18), we fit a linear regression model in which our outcome is height (in inches) and our explanatory variables are sex (as binary), age (in years), and race/ethnicity (operationalized here as five mutually exclusive categories).
We can use these estimates to compute a predicted height for any combination of values for the explanatory variables, whether observed in our data or not. A hat over a quantity in statistics always means “estimated” (or, in the context of an outcome, “predicted”):
(Note that when we are predicting the outcome, the error term in our model goes away. OLS assumes that the expected value of the \(\varepsilon_i\) is 0 across all observations.)
Now, say that we want to predict the height for a 15-year-old non-Hispanic Black girl. Because the observation is female, the term for male drops out, and because the observation is non-Hispanic Black, the other terms for race/ethnicity drop out as well.
Our predicted height is 64.57 inches (i.e., five-foot-four-and-a-half).
Aside: As it turns out, if we are actually trying to predict an outcome from linear regression estimates, in the sense of trying to make the best guess about an outcome that has not yet been observed (or just is not part of the data on which we fit our model), often the best prediction is not \(\hat{y}\).
The problem is that, in practice, regression models overfit data, meaning that they fit the data on which they are estimated better than new data. In that case, one may then predict better by effectively calculating \(\hat{y}\) as if all the regression coefficients were a bit closer to zero than the model estimates. If you read papers where authors talk about using something called “LASSO” or “Elastic net,” some version of this is what they are doing.