Download a CSV file from given URL and store it as the dataset with name iris_new.

computer science

Description

Q1.

Download a CSV file from given URL and store it as the dataset with name iris_new.

URL: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

Put the headers of “Sepal.Length”, “Sepal.Width”, “Petal.Length”, “Petal.Width” and “Species” for the dataset’s columns, respectively.

Find the mean of first 5 rows of “Sepal.Length” in your datset iris_new.

In “Species” variable, remove "Iris-" string so that column values will be “sentosa”, “virginica” and “versicolor”. Capitalize first letter of “Species” variable.

10 marks

 

 Q2.

Create a new dataset called “iris_sub” and load the observations from “iris_new” where “Sepal.Length” value is less than 6.4 and “Petal.Length” is greater than 5.1.

How many rows are there in iris_sub? What is the mean value of “Sepal.Width” and “Petal. Length” in iris_sub?

20 marks

 

 Q3.

Using a boxplot, find outliers in the iris_new dataset. For each outlier, mention variable name and number of outliers and value of outliers. Store the cases which have outliers in any of the variables in a new dataset. Find mean of “Petal.Width” in this new dataset.

 30 marks

 

Q4.

Use ggplot2 to visualize the variable in iris dataset. Use a proper chart type to plot Sepal.Width vs. Sepal.Length, and ”Petal.Length” vs. “Species”

30 marks

 

Q5.

From iris_new dataset, subset Sepal.Length, Petal.Length and Species. For the following values, find the type of Species when Sepal.Length =5.7 ; Petal.Length=4.1.

10 marks 


Related Questions in computer science category