Description
Introduction:
Environmental Protection Agency (or EPA for short) is responsible for regulating the amount of pollutant emission from all automobiles that run on American roads. You are asked to analyze the data released by EPA for more than a decade, specifically for three time periods: 2010 – 12, 2014-16, and 2018 – 20. There are several objectives to this case analysis, one of which is to test and learn about the possible changes in the amount of pollutions emitted by vehicles overtime. You are also asked to analyze similarities between vehicles over the three time periods and empirically determine if certain vehicles became more (or less) polluting over the period of study.
You will analyze various aspects of vehicle induced pollution using R programing. You are expected to submit findings in a report format. The report must be at least 20 pages long with written description and explanation of your findings to the questions asked below. Make sure to run all code using R Markdown and create a formal report with your remarks, comments or explanations embedded within the document.
You are given nine years of individual EPA data in csv format. The data files are not very large (each file is approx. 1 MB) . Each yearly file contains thousands of vehicles along with their vital information and pollution testing records. Each file contains 42 columns, the details of which are given in the Data Dictionary document. Please note that the original data had more columns, and some of them were removed for the consistency purposes. The deleted columns also exist in the data dictionary and you are asked to ignore them while referring to the dictionary.
There are three sections to this case study : Merging and cleaning ( 20 points), Data Analysis ( 6 0 points), Visualization ( 20 points) totaling 100 points.