The file bestsubset.py defines a function called bestsubset that takes two arguments, a dataframe X and array y. If X has k columns, then there are k variables under consideration to include in our model of y=f(X). We only consider linear models in this assignment, and the function bestsubset uses 5-folds cross validation to compare every possible linear model that includes one or more of the predictors in X. The function returns a list with two elements: a list of column indices included in the best model, and an array of coefficient estimates for that model. You should assume that X does NOT have a column for intercept, so you should have LinearRegression fit the intercept (i.e. use default setting).
The function suffers from a problem that if the number of predictors gets large, the function will be very
slow. The total number of iterations of the inner loop is 2^k since there are 2^k combinations of
predictors. Thus, this function is O(2^k), considerably slower than any algorithm we discussed in our
unit on algorithmic efficiency.
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
23 | 24 | 25 | 26 | 27 | 28 | 1 |
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | 31 | 1 | 2 | 3 | 4 | 5 |
Get Free Quote!
298 Experts Online