A. Objective: Applying and evaluating the k-means data clustering algorithm, using the RapidMiner Data Mining tool on a given data set. B. Data Set One of the well-known datasets that is being referenced in data mining is the “Iris data set”. The data set contains five attributes. 1. Class Label: Type of Iris Plant ( Iris Setosa, Iris Versicolour, Iris Virginica) 2. A1: sepal length in cm 3. A2: sepal width in cm 4. A3: petal length in cm 5. A4: petal width in cm Each class of Iris Plant has 50 instances (tuples/ examples). The data set has been traditionally used for classification. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. Knowing these facts, we will use the same data set for clustering. You will use the k-means clustering algorithm, which will cluster the database based on the attributes (2, 3, 4, 5). As you know in k-means clustering, you need to set the number of clusters that you wish to create. In this case, it will be three clusters. After applying the clustering model, you will compare the results with the facts that you already know. For example, you will test how many instances/examples have been clustered in each created cluster vs. the fact that there should be actually 50 instances of each Iris Plant type.
Get Free Quote!
446 Experts Online