ITEC-320
Predict Diamond Prices
The Diamonds
file contains data about 9900 diamonds, including their price. Your task is to
use any approach you want to predict the price of diamonds, and then tell me
your approach. I will then apply your approach to another dataset of 2000 diamonds
that I have, and see how well it predicts those diamonds’ prices. That’s all!
Okay,
it’s not THAT simple. But it’s still pretty simple.
You can
try out whatever operators from class that you want in RapidMiner, with
whatever parameters you want. I strongly recommend that you use a Cross
Validation operator, and try several approaches to see what gets you the lowest
RMSE. You can also use operators like Filter Examples, Select Attributes,
Nominal to Numerical (three of the attributes are qualitative), or any other
changes you’d like to make to the dataset. However, here’s the crucial part:
Whatever you do, you must be able
to show or tell me clearly enough so that I can replicate it exactly!
It’s not
enough to say “We used the Fortune Teller
operator after removing three attributes.” You need to tell me which three
attributes you removed, and what parameters you changed in the Fortune Teller
operator. (You could also include a screenshot of the Fortune Teller parameters
instead of writing out the individual changes.)
That
explanation of your approach is due via Blackboard on Sunday, 48 hours after
the start of our normal Friday class period. If you’re working individually,
that’s all you need to submit. If you’re working in a group, two important
things:
1. Only
one group member needs to submit the explanation on Blackboard, but all group
members’ full names must be listed in the submission.
2. Each group member must complete the peer assessment survey on Blackboard.
It’s short. If you are in a group and you do not complete this survey, you
will not get credit for the activity.
Grading:
40% for
submitting a clear explanation of your approach that I can understand and
replicate.
30% for your
approach outperforming a bad naïve approach that I created. This is simply a
check to make sure you’re doing something reasonable; your predictions don’t
have to be great to get full credit here.
30% for your
prediction results. This will be based on the RMSEs of the whole class’ predictions
for the prices of the 2000 other diamonds (which you do NOT have). The score
will be determined as follows:
The most
accurate set of predictions in the class (lowest RMSE on my data) will get 30/30.
The 2nd most accurate will get 29/30.
The 3rd most accurate will get 28/30.
Everyone
else’s score will be calculated based on the following formula using your RMSE:
30 – 3*(RMSE/X),
where X
is the RMSE of a good approach that I created.
If your RMSE is equal to mine, this formula works out to 27. If your RMSE is twice mine, it works out to 24. If your RMSE is more than twice mine, that’s
a bad sign, and you probably didn’t get a 30/30 on the previous part.
Get Free Quote!
428 Experts Online