Describe the data structure and identify int and binary variables.

statistics

Description

Use R Programming (RStudio Notebook) to analyze the following dataset - 'FacebookFriends'.  

Questions:

(1) Describe the data structure and identify int and binary variables. 

(2) Get the mean, median, sample standard deviation, and coefficient of variation for all variables. What do these statistics tell you about the distributions? 

(3) Choose all non-binary variables and create frequency tabulations, boxplots, and histograms for all. Describe the distribution for each. 

(4) Propose a regression model to predict number of Friends by selecting significant non-binary variables. Estimate the model and describe the fit (R-squared, etc.). Which of your proposed predictors are  significant? (Note: need to use the correlation map / heatmap to provide quantitative values for significance).

At the end of each section, comment briefly (one paragraph at most) about your observations of the results, with a final concluding remarks after section (4).  Use this opportunity to practice for reporting.

All answers should be provided inside the same notebook as your code and output.  No additional document should be used. No cut & paste of screen shots will be accepted.

Note: Explanations of the variables are provided in the 2nd Excel file. MidWest is omitted from the four regions to prevent perfect multicollinearity. 


Related Questions in statistics category