Which model was best? Report the results. Which variables were important? Is the result what you would expect?

data mining


DataRobot Machine Learning Challenge!

For this bonus project, you will need a license for the DataRobot application. You can purchase the license here:


…and select from one of the four options available for student accounts:

·  1 week (7 days) for $15

·  2 weeks (14 days) for $30

·  3 weeks (21 days) for $45

·  9 months (275 days) for $60

You will also need to select a dataset from: https://www.kaggle.com/datasets and/or a current challenge from: https://www.kaggle.com/competitions

Note: you do not have to submit your assignment to the online competitions in order to receive credit for the assignment (though, you certainly are welcome to do so, and there are cash prizes available).

Your assignment is to create and explain a predictive model using DataRobot to apply multiple Machine Learning techniques to your dataset. You will deliver a word doc that describes:

·         The dataset and the prediction challenge you are tackling from kaggle

o   Be sure to describe the data in reasonable detail here.

o   If you are not using a defined challenge, make sure your selection of target (dependent) variables makes sense in the context of your data. Ask if you have questions here!

·         The results of your first “naïve” model (without transforming any data or adding any new data to your set, just running Autopilot in DataRobot).

o   Which model was best? Report the results. Which variables were important? Is the result what you would expect?

·         Attempt to create a “blender” by using multiple models at once. Does this improve your results from the naïve model?

o    Explain and report.

o   Are the results growing more clear or more opaque?

·         Try “feature engineering” by transforming/adjusting/combining data in your dataset. Does this improve your results?

o   Report the best results you can achieve so far.

·         Full credit will only be given if you have improved on the naïve model via blending and feature engineering. Include screenshots/reports for verification. 

Related Questions in data mining category