The costs of printing and distributing catalogs are significant.

data mining

Description

 

Learning Objectives:

1.      Become familiar with market basket analysis using XLMiner.

2.      Understand how to interpret the results from classification for decision-making.

 

Task 1: Implement market basket analysis

 

Note: Please do not just mechanically run XLMiner, but make a sincere effort to understand what the results mean. The grading of this exercise will depend more on your discussion of the results, rather than the results themselves.

 

Case: Catalog Cross Selling at Exeter, Inc.

 

Exeter, Inc. is a catalog firm that sells products through a number of different catalogs.  The catalogs fall into eight basic categories: (1) Clothing, (2) Housewares, (3) Automotive, (4) Personal Electronics, (5) Computers, (6) Garden, (7) Novelty Gift, (8) Jewelry.

 

The costs of printing and distributing catalogs are significant. A serious problem is the cost of promoting products to people who may not wish to buy those products. Therefore, Exeter wants to take every opportunity to use them effectively. One such opportunity is in cross selling – once a customer has “taken the bait” and purchased a product, try to sell him/her another while you have his/her attention. 

 

Such cross promotion might take the form of enclosing one or more catalogs in the shipment of a purchased product—or mailing them separately to someone who has purchased a product recently—in order to induce a new purchase from those catalogs. However, which catalogs should be sent? Exeter would like it to be an informed choice—sending a catalog that has a higher probability of inducing a purchase than simply choosing a catalog at random.

 

Your task:

Step 1. Download CatalogCrossSell.xlsx from Canvas. Each row of the “Data” worksheet represents a customer’s recent purchase activity—“1” indicates a purchase of the specific catalog and “0” indicates no purchase on the catalog.

 

Step 2. Go to Association Rules in XLMiner (Data Mining Tab)

 

 

Step 3: Specify the appropriate data range (choose all the rows and the appropriate columns).  Check “First row contains headers,” and select “Data in binary matrix format.” Set the parameters as 20 transactions for minimum support and 50 for minimum confidence.  Click OK. 

 

Step 4: Based on the output, answer the following questions in AnswerSheet.docx:

 

(1) For Rule #1, explain how “Conf%” is calculated. What is the interpretation of value? (5 pts)

 

(2) For Rule #1, explain how “Lift Ratio” is calculated. What is the interpretation of this value? (5 pts)

 

(3) From the association rules that you have generated, identify the top rule you’d like to follow for your cross-selling campaign. Provide the rationale and interpret the rules in plain English. (10 pts)

 

(4) If you can only promote your sales on one catalog: (10 pts)

a)      Which catalog would you like to promote and why, if a customer has already bought personal electronics?

b)      Which catalog would you like to promote and why, if a customer has already bought novelty gift?

 

 

(5) Extra Credit: Suppose you want to promote your sales on housewares. If you can only send the catalog to one group of customers, which of the following would you send, and why? (10 pts)

a)      Customers who have just bought jewelry and personal electronics;

b)      Customers who have just bought novelty gift and personal electronics;

c)      Customers who have just bought novelty gift.

 

 

Task 2: Understand the output of classification tree

Case: Boston Housing

Download Boston_Housing.xlsx from canvas.

 

Suppose you have collected data on 11 factors as listed below for 506 housing units in Boston’s neighborhoods. For each of these 506 housing units, you already knew whether it belongs to high value or low value category (as indicated in column “CAT.MEDV” below).

 

CRIM:     per capita crime rate by town 

ZN:         proportion of residential land zoned for lots over 25,000 sq. ft. 

CHAS:     Charles River dummy variable (1 if tract bounds river; 0 otherwise) 

NOX:       nitric oxides concentration (parts per 10 million) 

AGE:       proportion of owner-occupied units built prior to 1940 

DIS:       weighted distances to five Boston employment centers 

RAD:      index of accessibility to radial highways 

TAX:      full-value property-tax rate per $10,000 

PTRATIO: pupil-teacher ratio by town 

B:          1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 

CAT. MEDV:    If the housing unit belongs to high value (=1) or low value (=0) category

 

(1) Use XLminer to produce a 3-level tree that predicts whether a housing unit is high-value or low-value. (10 points)

a)      Include a picture of this tree in your answer sheet

b)      Based on this tree, what properties of a house make it most likely that it will be high-value?


Related Questions in data mining category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.