1.What does the symbol y-hat represent in linear regression?
2.In the equation y-hat = theta_0 + theta_1*x, what does theta_0 represent?
3.Why does the MSE cost function square the residuals rather than sum them directly?
4.Which of the following best describes the OLS closed-form solution for simple linear regression?
5.In gradient descent what is the role of the learning rate alpha?
6.Gradient descent for linear regression with MSE is guaranteed to find the global minimum. Why?
7.When is the Normal Equation not computable?
8.You plan to use gradient descent to fit a linear regression model. Which preprocessing step is strongly recommended?
9.Which metric measures the proportion of variance in the target variable explained by the model?
10.A model achieves very low training error but high test error. What is this phenomenon called?
11.Which OLS assumption states that the spread of residuals should remain constant across all fitted values?
12.What is the key difference between Ridge regression and Lasso regression?
13.In the Ridge Normal Equation what practical problem does adding lambda*I to X-transpose-X solve?
14.You have a dataset with 50 training examples and 45 features. Which approach is most appropriate?
15.The VIF (Variance Inflation Factor) is used to detect which assumption violation?