[Get it solved] This is the file19.txt we needed this file for calculatin...

Check Out Our Work & Get Yours Done

Submit Work

Download Sample

Enroll in the complete course for only $250 USD*

Order Now

Submit work Offers

This is the file19.txt we needed this file for calculating our problem HARTIGAN is a dataset directory that contains test data for clustering algorithms.

data mining

Description

This is the file19.txt we needed this file for calculating our problem

HARTIGAN is a dataset directory that contains test data for clustering algorithms. The data files are all simple text files, and the format of the data files is explained on the web page at https://people.sc.fsu.edu/~jburkardt/datasets/hartigan/hartigan.html

Perform K-means clustering on file19.txt on the above web page.

# file19.txt

# Reference:

# John Hartigan,

# Clustering Algorithms,

# Wiley, 1975.

# ISBN 0-471-35645-X

# LC: QA278.H36

# Dewey: 519.5'3

# "Name" is the name of the animal.

# "I", "i", "C", "c", "P", "p", "M", "m", is the tooth pattern, the

# number of top incisors, bottom incisors, top canines, bottom canines,

# top premolars, bottom premolars, top molars, and bottom molars.

"Dentition of Mammals, Hartigan page 170"

9 columns

66 rows

"Name" "I" "i" "C" "c" "P" "p" "M" "m"

"Opossum" 5 4 1 1 3 3 4 4

"Hairy tail mole" 3 3 1 1 4 4 3 3

"Common mole" 3 2 1 0 3 3 3 3

"Star nose mole" 3 3 1 1 4 4 3 3

"Brown bat" 2 3 1 1 3 3 3 3

"Silver hair bat" 2 3 1 1 2 3 3 3

"Pigmy bat" 2 3 1 1 2 2 3 3

"House bat" 2 3 1 1 1 2 3 3

"Red bat" 1 3 1 1 2 2 3 3

"Hoary bat" 1 3 1 1 2 2 3 3

"Lump nose bat" 2 3 1 1 2 3 3 3

"Armadillo" 0 0 0 0 0 0 8 8

"Pika" 2 1 0 0 2 2 3 3

"Snowshoe rabbit" 2 1 0 0 3 2 3 3

"Beaver" 1 1 0 0 2 1 3 3

"Marmot" 1 1 0 0 2 1 3 3

"Groundhog" 1 1 0 0 2 1 3 3

"Prairie Dog" 1 1 0 0 2 1 3 3

"Ground Squirrel" 1 1 0 0 2 1 3 3

"Chipmunk" 1 1 0 0 2 1 3 3

"Gray squirrel" 1 1 0 0 1 1 3 3

"Fox squirrel" 1 1 0 0 1 1 3 3

"Pocket gopher" 1 1 0 0 1 1 3 3

"Kangaroo rat" 1 1 0 0 1 1 3 3

"Pack rat" 1 1 0 0 0 0 3 3

"Field mouse" 1 1 0 0 0 0 3 3

"Muskrat" 1 1 0 0 0 0 3 3

"Black rat" 1 1 0 0 0 0 3 3

"House mouse" 1 1 0 0 0 0 3 3

"Porcupine" 1 1 0 0 1 1 3 3

"Guinea pig" 1 1 0 0 1 1 3 3

"Coyote" 1 3 1 1 4 4 3 3

"Wolf" 3 3 1 1 4 4 2 3

"Fox" 3 3 1 1 4 4 2 3

"Bear" 3 3 1 1 4 4 2 3

"Civet cat" 3 3 1 1 4 4 2 2

"Raccoon" 3 3 1 1 4 4 3 2

"Marten" 3 3 1 1 4 4 1 2

"Fisher" 3 3 1 1 4 4 1 2

"Weasel" 3 3 1 1 3 3 1 2

"Mink" 3 3 1 1 3 3 1 2

"Ferrer" 3 3 1 1 3 3 1 2

"Wolverine" 3 3 1 1 4 4 1 2

"Badger" 3 3 1 1 3 3 1 2

"Skunk" 3 3 1 1 3 3 1 2

"River otter" 3 3 1 1 4 3 1 2

"Sea otter" 3 2 1 1 3 3 1 2

"Jaguar" 3 3 1 1 3 2 1 1

"Ocelot" 3 3 1 1 3 2 1 1

"Cougar" 3 3 1 1 3 2 1 1

"Lynx" 3 3 1 1 3 2 1 1

"Fur seal" 3 2 1 1 4 4 1 1

"Sea lion" 3 2 1 1 4 4 1 1

"Walrus" 1 0 1 1 3 3 0 0

"Grey seal" 3 2 1 1 3 3 2 2

"Elephant seal" 2 1 1 1 4 4 1 1

"Peccary" 2 3 1 1 3 3 3 3

"Elk" 0 4 1 0 3 3 3 3

"Deer" 0 4 0 0 3 3 3 3

"Moose" 0 4 0 0 3 3 3 3

"Reindeer" 0 4 1 0 3 3 3 3

"Antelope" 0 4 0 0 3 3 3 3

"Bison" 0 4 0 0 3 3 3 3

"Mountain goat" 0 4 0 0 3 3 3 3

"Musk ox" 0 4 0 0 3 3 3 3

"Mountain sheep" 0 4 0 0 3 3 3 3

2.2 K-means clustering (2.5 points divided evenly among the components)

Perform K-means clustering on file19.txt on the above web page.

This file contains a multivariate mammals dataset; there are 9 columns and 66 rows.

(a) Data cleanup (1 point divided evenly by components below)

(i) Think of what attributes, if any, you may want to omit from the dataset when you do the clustering. Indicate all of the attributes you removed before doing the clustering.

(ii) Does the data need to be standardized? (iii) You will have to clean the data to remove multiple spaces and make the comma character the delimiter. Please make sure you include your cleaned dataset in the archive file you upload.

(b) Clustering (2 points divided evenly by components below)

(i) Determine how many clusters are needed by running the WSS or Silhouette graph. Plot the graph using fviz_nbclust().

(ii) Once you have determined the number of clusters, run k-means clustering on the dataset to create that many clusters. Plot the clusters using fviz_cluster().

(iii) How many observations are in each cluster?

(iv) What is the total SSE of the clusters?

(v) What is the SSE of each cluster?

(vi) Perform an analysis of each cluster to determine how the mammals are grouped in each cluster, and whether that makes sense? Act as the domain expert here; clustering has produced what you asked it to. Examine the results based on your knowledge of the animal kingdom and see whether the results meet expectations. Provide me a summary of your observations.

Hint: to get the indices of all animals in cluster 1, you would execute: > which(k$cluster == 1) assuming k is the variable that holds the output of the kmeans() function call.

Related Questions in data mining category

Application: Immunization Registry Requirements Recall that in Week 3, you completed an assignment for which you were asked to generate a set of data categories for an immunization registry database.

What is the difference between database types and capacities? How do data inaccuracies affect patient care and reimbursement?

Using the data from Titanic with decision-tree mode, and cluster model to analyze the relationship between variables and survival rate in the Titanic event.

Use the provided faux_data.csv dataset. This file contains first name, last name, employee ID, gender, address, dollar, data, and comment data that needs to be cleaned. Follow the steps below to clean the dataset:

Your assignment is to create and organize the categories and subcategories of information that you would need for an immunization registry database.

dataset that you will be using can be found here:

case data, DATABASE, DATABASE MANAGEMENT SYSTEM, AND RELATIONAL DATABASE MODEL Case Assignment Case assignment in this module consists of two parts:

This assignment is founded on the Key Performance Indicators for the Google Merchandise Store for the 12 months starting on the 1st of September 2017 to 31st August 2018. You are required to:

Load dataset sattrn.arff, loaded in the folder "Assignments".

advantages and disadvantages of databases

Get Higher Grades Now

Tutors Online

Description

Drop Files Here Or Click to Upload

Get Free Quote!

435 Experts Online

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

Get Help Now

We Provide Services Across The Globe

Disclaimer: The reference papers or solutions provided by Calltutors.com serve as model papers or solutions for students or professionals and are not to be submitted as it is to any institutions. These documents are intended to be used for research and reference purposes only. University and company's logo's are the property of respected owners. We don't have affiliation with the mentioned universities. By using our services means, you agree to our Honor Code , Privacy Policy , Terms & Conditions , Payment , Refund & Cancellation Policy.

Enroll in the complete course for only $250 USD*

This is the file19.txt we needed this file for calculating our problem HARTIGAN is a dataset directory that contains test data for clustering algorithms.

data mining

Description

Get instant assignment help service

Related Questions in data mining category

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

We Provide Services Across The Globe

Enroll in the complete course for only $250 USD*

This is the file19.txt we needed this file for calculating our problem HARTIGAN is a dataset directory that contains test data for clustering algorithms.

data mining

Description

Get instant assignment help service

Related Questions in data mining category

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions & boost your grades

you can count us with it Highly Satisfied Students 4.9/5 Based On 19835+ Reviews

We Provide Services Across The Globe

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews