Advanced Business Data
Analysis
Replacement for Exam (50%)
Deadline for Submission – 9th
April, 2020 (12:55 hours)
In this assignment you will use statistical tests for non-normal data.
You may use methods (non-parametric statistics tests) and tools (R, Excel, or
SPSS) of your own choice - please don't rely on one tool or method, variety is
expected. It is not necessary to replicate any test you carry out, ie if you
perform a test in R it is not necessary to repeat in SPSS and/or Excel. A
data file (from the 2016 Census of Ireland) is suggested, though students are
permitted to choose a different file if they wish (subject to approval by Dr
O'Loughlin). Your task is to prepare a statistical report based on the data in
the file.
LINK:
The Central Statistics Office provides data on "Small Area
Population Statistics" from the 2016 census of Ireland – see:
https://www.cso.ie/en/census/census2016reports/census2016smallareapopulationstatistics/
For this assignment you will need two CSV files:
1. Small Areas
(18,641)
https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS2016_SA2017.csv
2. Small areas OSI
Boundaries
https://data.gov.ie/dataset/small-areas-generalised-100m-osi-national-statistical-boundaries-2015
The first file contains raw data based on the 2016 Census of Ireland.
The second file contains information such as location names and IDs. You should
be able to combine both data sets into one using the GUID field. The Glossary file at the above site will also be
useful:
(https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS_2016_Glossary.xlsx)
The Small Areas CSV file has 18,641 records based on 68 columns of data.
You are not expected to use all the data in the file and you may reduce to
eliminate unused data if you wish. As there are a lot of data in this file,
please be careful on what you decide to report on - it is up to you to choose.
Some suggested reports:
·
a comparison of methods of
transport to work by County/Planning Region
·
difference between
different methods of transport in urban vs rural areas
·
a comparison of journey
times to work by County/Planning Region
·
a comparison of time
leaving home to travel to work by County/Planning Region
·
Correlations may also be
tested
Suggested statistical tests:
·
Descriptive statistics for
all data used
·
Tests for normality such
Q-Q plots, Kolmogorov-Smirnov (please note - the Shapiro-Wilk test does not
work for sample sizes over 5,000)
·
Mann-Whitney U
Test/Wilcoxon Rank Test to compare two samples (eg - travel times for Kerry vs
Cork)
·
Kruskal-Wallis H Test to
compare three or more samples
·
Post-hoc tests where
appropriate
Suggested visual representation of data
·
Q-Q/P-P plots
·
Residuals
·
Box plots
·
Frequency
Distributions/Histograms
·
Scatter plots
Be aware that this is a statistical report and that Null/Alternate
hypotheses, justification of levels of significance, correct reporting of
results, and explanations of results are expected (see 8 Simple Rules document
in Moodle). Please also explain and justify any statistical test used. State
clearly any assumptions made.
Get Free Quote!
324 Experts Online