This question generates a comparison of the performance of the “train-validate” model selection process and the cross-validation

computer science

Description

Problem Set 6

J. Weldemariam

May 18, 2020

Question 1: Train-Validate versus Cross Validation, no association

This question generates a comparison of the performance of the “train-validate” model selection process and the cross-validation model selection process in the task of identifying the best explanatory variable in a linear regression from among a large collection, none of which are structurally related to the outcome variable. Success in this context requires that the method to provide evidence that the selected predictor is a poor predictor of the outcome.

Anticipating that replication will be required, the following function generates data with a train-validate split and a cross-validation index. It exploits the fact that the data are generated in a random order.

n<-50
p<-30
k.fold<-10

dat.make<-function(n,p,k.fold){
  y<-rnorm(n)
  xs<-matrix(rnorm(n*p),nrow=n)
  # Create train-validate identifiers for a 3/4 to 1/4 split
  train.ind<-rep(c(1,0),times=c(round(n*3/4),n-round(n*3/4)))
  # Create fold identifiers for k-fold cross-validation.
  xv.ind<-rep(1:k.fold,ceiling(n/k.fold))
  xv.ind<-xv.ind[1:n]
  dat.this<-data.frame(cbind(y,xs,train.ind,xv.ind))
  names(dat.this)<-c("y",str_c("x",1:p),"train.ind","xv.ind")
  return(dat.this)
}
set.seed(2345)
dat.this<-dat.make(n,p,k.fold)

Given a data set in the form returned by “dat.make”, the function “sse.best.tv” identifies the highest performing predictor when a model fit on the training set is used to predict the outcome on the validation set.

# Function to return the validation error of predicting "y" by the jth "x" variable
sse.tv<-function(j,dat.this){
    dat.train<-dat.this[dat.this$train.ind==1,]
    dat.valid<-dat.this[dat.this$train.ind==0,]


Related Questions in computer science category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.