For this part of the assignment, use the Ionosphere dataset. It's part of the mlbench library and can be loaded with

statistics

Description

Part 2: Ionsophere data

For this part of the assignment, use the Ionosphere dataset. It's part of the mlbench library and can be loaded with

data(Ionosphere, package='mlbench')

After loading the data, remove the first two columns from the dataset and create a training/validation set split so that each model you built can be evaluated using the validation set. 

First use overall accuracy as a metric to compare logreg, LDA, QDA, and kNN with k taking on all values between 1 and the max value you determine is appropriate for the dataset, inclusive. Use the formula we saw in this week's announcement for the confidence intervals of the accuracy rate so you can include the confidence interval for each accuracy estimate. Create a table showing the accuracy for each model and also answer the following questions in the text of your document. Which model has the best overall accuracy? Is it significantly different than the accuracy of any of the other models? 

Based on what you know about each classifiers strengths and weaknesses (Section 4.5), what might the different models be indicating about the dataset?

Based on what you can tell from exploratory visualization of the dataset, do you see any indications that one model should be more suitable than another? Include visualizations to back up your argument. 


Instruction Files

Related Questions in statistics category