[Get it solved] The baseball dataset consists of the statistics of 263 pl...

Check Out Our Work & Get Yours Done

Submit Work

Download Sample

Enroll in the complete course for only $250 USD*

Order Now

Submit work Offers

The baseball dataset consists of the statistics of 263 players in Major League Baseball in the season 1986.

statistics

Description

Problem 1

The baseball dataset consists of the statistics of 263 players in Major League Baseball in the season 1986. The dataset (hitters.csv) consist of 20 variables:

In this problem, we use Salary as the response variable, and the rest 19 variables as predictors/covariates, which measure the performance of each player in season 1986 and his whole career. Write R functions to perform variable selection using best subset selection partnered with BIC (Bayesian Information Criterion):

1) Starting from the null model, apply the forward stepwise selection algorithm to produce a sequence of sub-models iteratively, and select a single best model using the BIC. Plot the “BIC vs Number of Variables” curve. Present the selected model with the corresponding BIC.

2) Starting from the full model (that is, the one obtained from minimizing the MSE/RSS using all the predictors), apply the backward stepwise selection algorithm to produce a sequence of sub-models iteratively, and select a single best model using the BIC. Plot the “BIC vs Number of Variables” curve. Present the selected model with the corresponding BIC.

3) Are the selected models from 1) and 2) the same?

Problem 2

In this problem, we fit ridge regression on the same dataset as in Problem 1. First, standardize the variables so that they are on the same scale. Next, choose a grid of ? values ranging from ? = 1010 to ? = 10−2 , essentially covering the full range of scenarios from the null model containing only the intercept, to the least squares fit. For example: > grid = 10^seq(10, -2, length=100)

1) Write an R function to do the following: associated with each value of ? , compute a vector of ridge regression coefficients (including the intercept), stored in a 20 × 100 matrix, with 20 rows (one for each predictor, plus an intercept) and 100 columns (one for each value of ?).

2) To find the “best” ? , use ten-fold cross-validation to choose the tuning parameter from the previous grid of values. Set a random seed – set.seed(1), first so your results will be reproducible, since the choice of the cross-validation folds is random. Plot the “Cross-Validation Error versus ?” curve, and report the selected ?.

3) Finally, refit the ridge regression model on the full dataset, using the value of ? chosen by cross-validation, and report the coefficient estimates.

Remark: You should expect that none of the coefficients are zero – ridge regression does not perform variable selection.

Related Questions in statistics category

For the scores in the following table, what is the value of ΣX2 = ?

Collect data and summarize descriptive statistics for the outbreak, including charts and graphs.

Computation of the test statistic value. The test statistic value (also called the obtained value or observed value) is the result or product of a specific statistical calculation.

Entering and Saving Data; Performing Descriptive Statistics

Construct the histogram of the variable ”Petal.Length” in ”iris”, then change the bins in an appropriate manner.

This module is a summary of all the concepts and application you have encountered in the previous modules.

A link to the dataset is in the introduction to the assignment on Canvas.

In Dallas, 50% of families subscribe to the morning newspaper, 65% of the families subscribe to the evening newspaper and 85% subscribe to either the morning or the evening newspaper.

How do I get the better understanding of how to enter formulas into Microsoft excel?

Analyze the simulation model, with 50 trials, for Koehler Vision Associates (KVA) with the following assumptions.

Get Higher Grades Now

Tutors Online

Description

Drop Files Here Or Click to Upload

Get Free Quote!

344 Experts Online

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

Get Help Now

We Provide Services Across The Globe

Disclaimer: The reference papers or solutions provided by Calltutors.com serve as model papers or solutions for students or professionals and are not to be submitted as it is to any institutions. These documents are intended to be used for research and reference purposes only. University and company's logo's are the property of respected owners. We don't have affiliation with the mentioned universities. By using our services means, you agree to our Honor Code , Privacy Policy , Terms & Conditions , Payment , Refund & Cancellation Policy.

Enroll in the complete course for only $250 USD*

The baseball dataset consists of the statistics of 263 players in Major League Baseball in the season 1986.

statistics

Description

Get instant assignment help service

Related Questions in statistics category

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

We Provide Services Across The Globe

Enroll in the complete course for only $250 USD*

The baseball dataset consists of the statistics of 263 players in Major League Baseball in the season 1986.

statistics

Description

Get instant assignment help service

Related Questions in statistics category

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions & boost your grades

you can count us with it Highly Satisfied Students 4.9/5 Based On 19835+ Reviews

We Provide Services Across The Globe

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews