Suppose the itemset {A, B, C, D, E} has a Support value of 1, then what is the Lift value of this association rule {B, D} {A, C, E}?

computer science

Description

CS 484: Introduction to Machine Learning

Spring 2021 Assignment 2

Question 1 (5 points)

Suppose the itemset {A, B, C, D, E} has a Support value of 1, then what is the Lift value of this association rule {B, D}  {A, C, E}?

Question 2 (5 points)

You invited your six friends to your home to watch a basketball game.  Your friends brought snacks and beverages along.  The following table lists the items your friends brought.

Friend Items

Andrew Cheese, Cracker, Soda, Wings

Betty Cheese, Soda, Tortilla

Carl Cheese, Ice Cream, Soda, Wings 

Danny Cheese, Ice Cream, Salsa, Tortilla

Emily Salsa, Tortilla, Wings

Frank Cheese, Cracker, Ice Cream, Soda, Wings

You noticed that many of your friends brought Cheese, Soda, and Wings together.  Since you rather want to spend your money on food than Soda, you want to study how likely your friends will also bring Soda if they are going to bring Cheese and Wings.  Therefore, please tell me the Lift of this association rule {Cheese, Wings} ==> {Soda}.

Question 3 (5 points)

You are provided with the following scatterplot of two interval variables, namely,  and .  Without accessing the data, what do you think the Silhouette value will be for the 3-cluster K-mean solution? (A) Close to negative one, (B) About zero, (C) Close to one, (D) Close to three, or (E) Cannot be determined

 


Question 4 (15 points)

Suppose Cluster 0 contains observations {-2, -1, 1, 2, 3} and Cluster 1 contains observations {4, 5, 7, 8}.  

(5 points) Calculate the Silhouette Width of the observation 2 (i.e., the value -1) in Cluster 0.

(5 points) Calculate the cluster-wise Davies-Bouldin value of Cluster 0 (i.e., ) and Cluster 1 (i.e., ).

(5 points) What is the Davies-Bouldin Index of this two-cluster solution?

Question 5 (30 points)

The file Groceries.csv contains market basket data. The variables are:

Customer: Customer Identifier

Item: Name of Product Purchased

After you have imported the CSV file, please discover association rules using this dataset.  For your information, the observations have been sorted in ascending order by Customer and then by Item.  Also, duplicated items for each customer have been removed.


(10 points) We are only interested in the k-itemsets that can be found in the market baskets of at least seventy five (75) customers.  How many itemsets in total can we find?  Also, what is the largest k value among our itemsets?


(5 points) Use the largest k value you found in (a), find out the association rules whose Confidence metrics are greater than or equal to 1%.  How many association rules can we find?  Please be reminded that a rule must have a non-empty antecedent and a non-empty consequent.  Please do not display those rules in your answer.


(10 points) Plot the Support metrics on the vertical axis against the Confidence metrics on the horizontal axis for the rules you found in (b).  Please use the Lift metrics to indicate the size of the marker.  You must add a color gradient legend to the chart for the Lift metrics.


(5 points) Among the rules that you found in (b), list the rules whose Confidence metrics are greater than or equal to 60%.  Please show the rules, including the Support, the Confidence, and the Lift metrics, in a table.

Question 6 (40 points)

You are asked to discover the optimal clusters in the cars.csv.  Here are the specifications.

The input interval variables are Weight, Wheelbase, and Length

Scale each input interval variable such that the resulting variable has a range of 0 to 10

The distance metric is Manhattan

The minimum number of clusters is 2

The maximum number of clusters is 10

Specify random_state = 60616 in calling the KMeans function in scikit-learn library


Please answer the following questions.

(20 points) List the Elbow values, the Silhouette values, the Calinski-Harabasz Scores, and the Davies-Bouldin Indices for your 2-cluster to 10-cluster solutions. 

(10 points) Based on the values in (a), what is your suggested number of clusters?

(10 points) What are the cluster centroids of your suggested cluster solution?  Please show the centroids in their original scales.


Instruction Files

Related Questions in computer science category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.