Determine the Ideal k (where k is the number of clusters). First apply k-means clustering setting k from 1 to 10. Using Excel, generate a line graph and plot k values on the x-asis and SSE for each k value on the y-axis.

data mining

Description

1. Determine the Ideal k (where k is the number of clusters). First apply k-means clustering setting k from 1 to 10. Using Excel, generate a line graph and plot k values on the x-asis and SSE for each k value on the y-axis. Based on the line graph, select the point/k value where the SSE stabilizes or where the line stabilizes/becomes constant. Submit a copy of the line graph and state which k you selected.

Note: k-means only applies to numeric data, and DATE is NOT numeric.

2. Split the dataset records into 3 equal bins and generate 3 separate csv files, one for each bin. (For simplicity, this can be done using Excel). Submit all binned files.

3. For each bin, apply k-means clusters and set k to the value you selected in question (1). Describe your clustering outcomes. Provide screenshots.

4. What alternative analysis can you perform on this data and why? 


Related Questions in data mining category