Analytical Methods I (ANLY
502) Course
Project: Tell your data story!
We are surrounded by data. As future data analysts, you will be called upon to tell a story, answer questions, and make predictions from a data set. This is what you will do with this
project.
Tasks
Data set: Find a data set with
the
following characteristics
● 1 response variable
● At least
10 explanatory variables ● At least
100 observations
EDA: Do some exploratory data analysis
to
tell an “interesting” story about
data. Instead of limiting yourself to relationships
between just two
variables, broaden the scope of your analysis
and employ creative
approaches that evaluate
relationships between
two
variables while controlling for another one.
Inference: Come up with a
research question that can be answered with a hypothesis
test or a confidence interval. Your question could be used
to shed some light on your choice of the “best” linear model. Carry out
the appropriate inference task to answer your question.
Modeling: Develop
the “best”
multiple linear regression
model to explain your response variable
values.
Prediction: Based on your model,
make a prediction using the predict function
in R. Also quantify the uncertainty
around this prediction.
Deliverables
A. Data set description
submitted toMoodle
○ General description of data
○ Link to data set
○ List of explanatory variables
○ Size of data set
B. Report
○ 4-6 pages @ 12 pt font size, arial or times
new roman fonttype,
double-spaced
○ Your report should
be organized with the following parts included and clearly labeled:
1. Introduction: a summary of the data set and yourgoal.
2. EDA: any univariate or bivariate
summaries worthreporting.
3. Inference: Answer the research question
you have posed using a hypothesis
test or a confidence interval.
4. The “Best” Model:
What is the “best”
linear model for predicting the
response variable?
You do not need to explain
every
step you took to arrive
at this model, but
should give
some indication of why you
chose the model you did. If you tried
a few different models, how did you
settle on one?
● How well does your model do? What is the percent variation explained?
● What does
your
model tell you about
relationships betweenyour
explanatory variables
and your response variable?
● What conditions
do you need for your analysis
to
hold? What are
the implications if some of those conditions
areviolated.
5. Prediction:
Using your best model, make a prediction about a future event from your response
variable.
Include a description of the
uncertainty of your prediction.
6. Conclusion:
● What is
the
bottom line from your analysis?
● How well can
you predict your response variable? ● What are
the caveats to
your analysis?
● Does this
data
set lack information that you
would have
liked to use?
C. Code
○ Additional
details will be provided later
D. Presentation
○ 15 minutes max
○ Live synchronous delivery
on Adobe connect with ALL team memberspresent
○ Scheduling instructions will
be provided later
Tips for your
report and presentation
This project is
an opportunity to apply
what you
have
learned about descriptive statistics, graphical
methods, correlation and regression,
and hypothesis testing and confidence
intervals.
The goal
is not to do an exhaustive
data analysis i.e., do not calculate every statistic and procedure you have learned
for every variable, but rather
to show that you
are proficient at using R at a basic level
and that you
are proficient at interpreting
and presenting the results.
You might consider critiquing your own method, such as issues pertaining to the
reliability of the data and the appropriateness of the statistical
analysis you
used within the context of this specific data
set.
Grading
Grading of
the project will take into
account:
Correctness:
Are
the procedures and explanations correct? Presentation: What was the
quality of the
presentation and poster? Content/Critical
thought: Did your think carefully about the problem?
Your grade will
be roughly based on the following components:
30% report
30% presentation 25% code
10% team peer evaluations
` 5% data
set
description submitted onMoodle
Peer feedback: You
will
be asked to fill
out a questionnaire during the last executive session.
Get Free Quote!
336 Experts Online