What predictors do you think contributes to the churn of (i) only telephone customers, (ii) only Internet service customers, and (iii) customers who subscribe to both phone and Internet services?

data mining

Description

Professor is looking for advanced models and use of Classification Models as well for this.

Download the file "TelcoChurn.xlsx". 

This file contains data from a sample of 7043 subscribers of telephone and/or Internet Services for a large telco. We want to create three separate models to understand the predictors of churn of (i) subscribers of telephone services, (ii) subscribers of internet services, and (iii) people who subscribe to both services. Analyze the data carefully (data definitions provided in the second worksheet of the Excel file). Submit your results in a nicely formatted Word or PDF file and your R code file. The assignment need to follow the lines of sample SDM_A8 Solution document .

It should have the Histograms, boxPlots, QQplots etc with analysis and explanations.

R code file need to be properly and thoroughly commented as well


1. Clean, process, and partition data as necessary, using appropriate R code.

1.5 Need to create table for all variables, which shows the corresponding alternative hypotheses, with a one-sentence rationale for each hypothesis. Be sure to include the right signs (positive or negative) for each hypothesis. (Please see sample assignment table with Hypothesis and rationale)

2. What predictors do you think contributes to the churn of (i) only telephone customers, (ii) only Internet service customers, and (iii) customers who subscribe to both phone and Internet services? List reasoning for your answer. No points without reasoning.

3. Create training and test data sets with a 75:25 split using a random seed of 1024. Train logit models with the variables you identified in (b) and the training data. Combine the three model outputs using stargazer.

4. What are the top three predictors of churn of (i) only telephone customers, (ii) only Internet service customers, and (iii) customers who subscribe to both phone and Internet services. Explain using marginal effects how much each predictor contributes to churn probability. (3 points)

5. Use TWO metrics to indicate which of these three models in Question 4 has best fit with the training data set and which model has the worst fit? How do you know?


6. Fit your models using test data, and compute recall, precision, F1-score, and AUC values for each of your three models. Which model worked best for your classification analysis?


Related Questions in data mining category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.