CS 484: Introduction to Machine Learning
Spring 2021 Assignment 2
Question 1 (5 points)
Suppose the itemset {A, B, C, D, E} has a Support value of 1, then what is the Lift value of this association rule {B, D} {A, C, E}?
Question 2 (5 points)
You invited your six friends to your home to watch a basketball game. Your friends brought snacks and beverages along. The following table lists the items your friends brought.
Friend Items
Andrew Cheese, Cracker, Soda, Wings
Betty Cheese, Soda, Tortilla
Carl Cheese, Ice Cream, Soda, Wings
Danny Cheese, Ice Cream, Salsa, Tortilla
Emily Salsa, Tortilla, Wings
Frank Cheese, Cracker, Ice Cream, Soda, Wings
You noticed that many of your friends brought Cheese, Soda, and Wings together. Since you rather want to spend your money on food than Soda, you want to study how likely your friends will also bring Soda if they are going to bring Cheese and Wings. Therefore, please tell me the Lift of this association rule {Cheese, Wings} ==> {Soda}.
Question 3 (5 points)
You are provided with the following scatterplot of two interval variables, namely, and . Without accessing the data, what do you think the Silhouette value will be for the 3-cluster K-mean solution? (A) Close to negative one, (B) About zero, (C) Close to one, (D) Close to three, or (E) Cannot be determined
Question 4 (15 points)
Suppose Cluster 0 contains observations {-2, -1, 1, 2, 3} and Cluster 1 contains observations {4, 5, 7, 8}.
⦁ (5 points) Calculate the Silhouette Width of the observation 2 (i.e., the value -1) in Cluster 0.
⦁ (5 points) Calculate the cluster-wise Davies-Bouldin value of Cluster 0 (i.e., ) and Cluster 1 (i.e., ).
⦁ (5 points) What is the Davies-Bouldin Index of this two-cluster solution?
Question 5 (30 points)
The file Groceries.csv contains market basket data. The variables are:
⦁ Customer: Customer Identifier
⦁ Item: Name of Product Purchased
After you have imported the CSV file, please discover association rules using this dataset. For your information, the observations have been sorted in ascending order by Customer and then by Item. Also, duplicated items for each customer have been removed.
⦁ (10 points) We are only interested in the k-itemsets that can be found in the market baskets of at least seventy five (75) customers. How many itemsets in total can we find? Also, what is the largest k value among our itemsets?
⦁ (5 points) Use the largest k value you found in (a), find out the association rules whose Confidence metrics are greater than or equal to 1%. How many association rules can we find? Please be reminded that a rule must have a non-empty antecedent and a non-empty consequent. Please do not display those rules in your answer.
⦁ (10 points) Plot the Support metrics on the vertical axis against the Confidence metrics on the horizontal axis for the rules you found in (b). Please use the Lift metrics to indicate the size of the marker. You must add a color gradient legend to the chart for the Lift metrics.
⦁ (5 points) Among the rules that you found in (b), list the rules whose Confidence metrics are greater than or equal to 60%. Please show the rules, including the Support, the Confidence, and the Lift metrics, in a table.
Question 6 (40 points)
You are asked to discover the optimal clusters in the cars.csv. Here are the specifications.
⦁ The input interval variables are Weight, Wheelbase, and Length
⦁ Scale each input interval variable such that the resulting variable has a range of 0 to 10
⦁ The distance metric is Manhattan
⦁ The minimum number of clusters is 2
⦁ The maximum number of clusters is 10
⦁ Specify random_state = 60616 in calling the KMeans function in scikit-learn library
Please answer the following questions.
⦁ (20 points) List the Elbow values, the Silhouette values, the Calinski-Harabasz Scores, and the Davies-Bouldin Indices for your 2-cluster to 10-cluster solutions.
⦁ (10 points) Based on the values in (a), what is your suggested number of clusters?
⦁ (10 points) What are the cluster centroids of your suggested cluster solution? Please show the centroids in their original scales.
Get Free Quote!
367 Experts Online