[Solved] For this problem you will experiment with various classif...

Check Out Our Work & Get Yours Done

Submit Work

Download Sample

Enroll in the complete course for only $250 USD*

Order Now

Submit work Offers

For this problem you will experiment with various classifiers provided as part of the scikit-learn (sklearn) machine learning module, as well as with some of its preprocessing and model evaluation capabilities.

data mining

Description

Please number the question on the python notebook.

[Dataset: magic04.csv]

https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope

For this problem you will experiment with various classifiers provided as part of the scikit-learn (sklearn) machine learning module, as well as with some of its preprocessing and model evaluation capabilities. The data is provided in a CSV formatted file with the first row containing the attribute names. Click “Data Folder”, and you can download the dataset to your PC by right-clicking and then selecting “save link as” the magic04.data link. The description of the different fields in the data is provided at http://archive.ics.uci.edu/ml/machine-learning-databases/magic/magic04.names . Please try to read the document and understand the case and the dataset.

In this assignment, you need to use the scikit-learn package, the main machine learning package in python to develop an ipython notebook. Please take a look at the scikit-learn home page (http://scikit-learn.org/stable/index.html) to get an overview of the package.

You want to make sure the scikit-learn package you are using is v20 or later versions. If you installed anaconda recently, you should have the version v23.2, which is fine though the latest version of sklearn is v24.1.

Please develop an ipython notebook titled 770_21_a1_yourlastname to finish the following tasks. You probably want to finish the tasks by modifying the German credit notebook I used in week 3 lecture

You are required to create an ipython notebook cell for each of the following tasks, where (C) indicates that you need to write code for the task, (O) indicates that you need to show output, and (A) that you need to type your answers using Markdown text.

At the beginning of each cell, you need to indicate which task the cell is about. For example, in the cell related to task 1, you should first type “# Task 1: Import data”. If you do not clearly label the cells, you will lose 1-2 points (out of 18 points).

1. You need to import data. (C) - completed

2. In this dataset, the dependent variable is class. It includes two categories: g and h. g represents gamma (signal), and h hadron (background). Please insert a cell and print the value count of each category. (C)(O) - completed

3. All the other variables are independent variables. Please insert a cell and print the histograms of the independent variables (C)(O). - completed

4. Insert a cell and print the basic stats of each independent variable using the describe() method (C)(O). – completed.

5. Insert a cell and write code to split the dataset into training and validation sets (Please use 60%-40% split) (C).

6. Insert a cell and describe the uses of validation (at least 3 uses). (A). I will complete this portion.

7. Insert a cell. In this cell, you need to use scikit-learn’s logistic regression classifier (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) and fit a model using the training dataset (C). Then you run the classifier on the validation set (C). Print the validation dataset classification report and Area Under the Receiver Operating Characteristic Curve (ROC AUC) for the validation set. (please google to find out how to get AUC using scikit-learn) (C)(O).

8. Insert a cell and use your own language to describe the SVM algorithm (with at most 8 sentences) (A). I will complete this portion.

9. Insert a new cell. In this cell, you use the same training and validation dataset you obtained in task 5 to fit SVM classifiers (Please use the SVC function in scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). You need to tune the SVM hyperparameter, C (default = 1.0), the Regularization parameter. You need to try each C in the list [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] – you must use a FOR loop. In each iteration, please print the validation set classification report and AUC. (C)(O).

10. Insert a new cell. In this cell, please first tell me which C in the list [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] gives you the optimal SVM classifier with respect to AUC (A). Then, please use your own language (with at most 4 sentences) to discuss what this hyperparameter C means (A).

11. Insert a cell and write code to fit a random forest classifier (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) using the same training and validation dataset obtained in task 5 and print classification report and AUC. When you fit the random forest model, you can just use the default hyperparameters (C)(O).

Insert a new cell and use your own language (with at most 8 sentences) to describe the random forest algorithm (A).

Instruction Files

myworkbook1.ipynb

52.3 KB

myworkbook.docx

16.9 KB

Price $15

Buy Ready Solution

(546 times downloaded)

OR

Get Same Assignment Done From Scratch

Get instant assignment help service

Related Questions in data mining category

The purpose of an Insurance is to provide protection against the risk of any financial loss.

List the order number & date of all orders in the month of October.

List the first name and the last name of each customer who placed more than the average number of orders. Along with the names, show the total each of the listed customers paid on all orders he or she placed.

The objective of this Portfolio Project is mining data from a data warehouse, which contains data from the Northwind database that was constructed during your installation of PostgreSQL.

DATA USA http://datausa.io/ is a data analytics project done by the MIT Media Lab and Deloitte Consulting with the goal of providing the most comprehensive visualization of US public data. To see what it is, visit the website and type in the name of a cit

Design a flowchart or pseudocode for a program that accepts two numbers from a user and displays one of the following messages: First is larger, Second is larger, Numbers are equal.

How vulnerable are databases and how can they be secured? Response should be at least 150 words

The section should include history and background of organization’s name, and the industry associated with the organization.

Apply dimensionality reduction feature selection in R using the attached dataset.

advantages and disadvantages of databases

Disclaimer

The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.

Get Higher Grades Now

Tutors Online

Description

Drop Files Here Or Click to Upload

April

January

February

March

April

May

June

July

August

September

October

November

December

2025

1950

1951

1952

1953

1954

1955

1956

1957

1958

1959

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

2027

2028

2029

2030

2031

2032

2033

2034

2035

2036

2037

2038

2039

2040

2041

2042

2043

2044

2045

2046

2047

2048

2049

2050

Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3

00:00

00:30

01:00

01:30

02:00

02:30

03:00

03:30

04:00

04:30

05:00

05:30

06:00

06:30

07:00

07:30

08:00

08:30

09:00

09:30

10:00

10:30

11:00

11:30

12:00

12:30

13:00

13:30

14:00

14:30

15:00

15:30

16:00

16:30

17:00

17:30

18:00

18:30

19:00

19:30

20:00

20:30

21:00

21:30

22:00

22:30

23:00

23:30

Get Free Quote!

421 Experts Online

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

Get Help Now

We Provide Services Across The Globe

Disclaimer: The reference papers or solutions provided by Calltutors.com serve as model papers or solutions for students or professionals and are not to be submitted as it is to any institutions. These documents are intended to be used for research and reference purposes only. University and company's logo's are the property of respected owners. We don't have affiliation with the mentioned universities. By using our services means, you agree to our Honor Code , Privacy Policy , Terms & Conditions , Payment , Refund & Cancellation Policy.

Enroll in the complete course for only $250 USD*

For this problem you will experiment with various classifiers provided as part of the scikit-learn (sklearn) machine learning module, as well as with some of its preprocessing and model evaluation capabilities.

data mining

Description

Instruction Files

Price $15

OR

Get instant assignment help service

Related Questions in data mining category

Disclaimer

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews

We Provide Services Across The Globe

Enroll in the complete course for only $250 USD*

For this problem you will experiment with various classifiers provided as part of the scikit-learn (sklearn) machine learning module, as well as with some of its preprocessing and model evaluation capabilities.

data mining

Description

Instruction Files

Price $15

OR

Get instant assignment help service

Related Questions in data mining category

Disclaimer

Policy

Exploring

Other

Connect With Us

Get Instant Help with your Questions & boost your grades

you can count us with it Highly Satisfied Students 4.9/5 Based On 19835+ Reviews

We Provide Services Across The Globe

Get Instant Help with your Questions &
boost your grades

you can count us with it
Highly Satisfied Students 4.9/5
Based On 19835+ Reviews