The dataset auto-mpg.csv is used, where
- mpg is the dependent variable
- 'cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'year', 'origin' are the predictor variables.
y = β0 + β1x1 + βnxn + ε
We are looking for the coefficients βi that lead to the minimal model error.
https://www.scribbr.com/statistics/multiple-linear-regression/
scikit-learn offers KFold and StratifiedKFold.
- use KFold
for Regression problems
- use StratifiedKFold
for Regression problems
because in classification problems, using KFold could lead to a drift in balance of classes,
StratifiedKFold prevents unintended shift.