The file bestsubset.py defines a function called best subset that takes two arguments, a data frame X and array y.

computer science

Description

The file bestsubset.py defines a function called best subset that takes two arguments, a data frame X and array y. If X has k columns, then there are k variables under consideration to include in our model of y=f(X). We only consider linear models in this assignment, and the function bestsubset uses 5-folds cross validation to compare every possible linear model that includes one or more of the predictors in X. The function returns a list with two elements: a list of column indices included in the best model, and an array of coefficient estimates for that model. You should assume that X does NOT have a column for intercept, so you should have LinearRegression fit the intercept (i.e. use default setting).


Related Questions in computer science category