[Solved] In this assignment, you will need to implement a simple r...

Check Out Our Work & Get Yours Done

Submit Work

Download Sample

Enroll in the complete course for only $250 USD*

Order Now

Submit work Offers

In this assignment, you will need to implement a simple recommender system using a book rating data set DBbook_train_ratings

data mining

Description

Should be done in ipython notebook . Should use scikit learn.

In this assignment, you will need to implement a simple recommender system using a book rating data set DBbook_train_ratings.tsv (reference: https://lists.w3.org/Archives/Public/public-rww/2013Dec/0002.html). The first column of this data set contains user IDs. The second column contains itemIDs (i.e., book ids). The third column contains the rating scores (1 – 5). The purpose of studying this data set is to create a data mining model that recommend books to users. The data set (DBbook_train_ratings.tsv) can be downloaded from D2L.

Please submit an iPython Notebook (you don’t really need to submit the dataset. If you want to change the original dataset, you need to write code to do that). Please use “run all” to run your code before you submit so that your iPython notebook will show the outputs of your code. You will lose 1 point if you do not “run all”. You probably want to copy and modify the code from the ipython notebook ml_100k posted on D2L.

Please develop an ipython notebook titled 770_21_a2_yourlastname and finish the tasks below, where (C) indicates that you need to write code for the task, (O) indicates that you need to show output, and (A) that you need to type your answers using Markdown text. In your iPython notebook, at the beginning of each cell, you need to indicate which task the cell is about. For example, in the cell related to task 1, you should first type “# Task 1: Import data”. If you do not clearly label the cells, you will lose 1-2 points (out of 18 points). Whenever you see “print” in the questions, you need write print statement to print the intended outputs.

1. Please write code to print the number of unique users and the number of unique books in this data set. (C)(O)

2. Please write code to create the utility matrix. Each row of this matrix represents a user, and each column represents an item. Print the first 10 rows of matrix. Please write code to print the number and the percentage of cells in the utility matrix that are not populated. Please write code to fill these empty cells with 0s. (C)(O)

3. Please write code to print the top 5 similar users to userID 2 based on Euclidean distance. (C)(O)

4. Please write code to print the Euclidean distance between itemID 18 and itemID 1. Please write code to print the Enclidean distance between itemID36 and itemID 1. Write a print statement that tells me between itemID36 and itemID18, which is more similar to itemID 1 and why. For example, you can write a print statement like print(“itemID36 is more similar to itemID 1 because some reason…” ). (C)(O)

5. Please write code to print the top 5 similar items to itemID 8010. (C)(O)

6. Write code to remove books and users with less than 20 rating scores from the utility matrix by copying and maybe modifying the following codes. Write code to print the shape of the dataset. (C)(O)

df_item_fre = df_data1.groupby("itemID").count()

df_user_fre = df_data1.groupby("userID").count()

selected_items = df_item_fre[df_item_fre["userID"]>20].index

dense_matrix = dense_matrix[selected_items]

selected_users = df_user_fre[df_user_fre["itemID"]>20].index

dense_matrix = dense_matrix.loc[selected_users]

7. Please use the dataset you obtained from task 6 and write code to remove users that haven’t rated itemID8010, and then please write code to print the counts of the different rating scores of this item (hint: use the function value_counts()). Print the shape of the dataset. (C)(O)

8. Write code to partition the data set you obtained from 7 for validating the performance on predicting rating on itemID 8010. Randomly select 25% of the users as the testing set and the others as the training set. Please print the dimensions of the training set and the testing set. Please write code to print the mean rating of itemID 8010 in the training set and its mean rating in the testing set. (Hint: use dense_matrix[8010].mean() method to calculate the means) (C)(O)

9. Use the training and test dataset obtained in 8 and write code to 1) print the userID of the the user in the 5th row (not userID5) in the test dataset, and 2) predict this user’s rating of itemID 8010 based on the top 5 similar users in the training dataset, and print the user’s predicted rating and the actual rating of the book. (C)(O)

Instruction Files

DBbooktrainratings1.tsv

959.1 KB

myworkbook2.docx

20.3 KB

Price $15

Buy Ready Solution

(798 times downloaded)

OR

Get Same Assignment Done From Scratch

Get instant assignment help service

Related Questions in data mining category

You are required to carry out an investigation on web information retrieval and web data mining. You should present the results of this investigation in the form of a report. The report should include:

The primary objective is to use classification techniques learnt so far. Each loan is graded (A to G) based on the risk, with A being least risky and G being the highest risk category.

Data Warehousing Text For an airlines company, how can strategic information increase the number of frequent flyers? Discuss giving specific details.

prepare a Data Flow Diagram and flowchart of the following scenario. (Your DFD should go to a Level !.) WORKFLOW FOR A PRIMARY CARE CLINIC WITH A PAPER MEDICAL RECORD The typical workflow for a patient visit at this primary care clinic begins with the pat

What role might exploratory field trips to cities like Beijing shanghai be able to play? What data can collect from the trip? How might such data contribute to the decision whether to enter those markets?

Based on the feedback received on your Topic 2 assignment Database Schema, provide SQL statements to create the (revised) database and populate it with sample data (at least four rows per table).

Your colleague has come up with a great piece of code to do handwriting recognition.

bokeh.models import ColumnDataSource, Button, Select, Div from bokeh.sampledata.

Your organization, a consumer automobile research firm, wishes to analyze data from a study of fuel economy among the major automobile models to determine how the variables in the data set correlate with fuel economy.

List the first name and the last name of each customer who placed more than the average number of orders. Along with the names, show the total each of the listed customers paid on all orders he or she placed.

Disclaimer

The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.

Get Higher Grades Now

Tutors Online

Description

Drop Files Here Or Click to Upload

Get Free Quote!

365 Experts Online

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

Get Help Now

We Provide Services Across The Globe

Disclaimer: The reference papers or solutions provided by Calltutors.com serve as model papers or solutions for students or professionals and are not to be submitted as it is to any institutions. These documents are intended to be used for research and reference purposes only. University and company's logo's are the property of respected owners. We don't have affiliation with the mentioned universities. By using our services means, you agree to our Honor Code , Privacy Policy , Terms & Conditions , Payment , Refund & Cancellation Policy.

Enroll in the complete course for only $250 USD*

In this assignment, you will need to implement a simple recommender system using a book rating data set DBbook_train_ratings

data mining

Description

Instruction Files

Price $15

OR

Get instant assignment help service

Related Questions in data mining category

Disclaimer

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

We Provide Services Across The Globe

Enroll in the complete course for only $250 USD*

In this assignment, you will need to implement a simple recommender system using a book rating data set DBbook_train_ratings

data mining

Description

Instruction Files

Price $15

OR

Get instant assignment help service

Related Questions in data mining category

Disclaimer

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions & boost your grades

you can count us with it Highly Satisfied Students 4.9/5 Based On 19835+ Reviews

We Provide Services Across The Globe

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews