The Center for Disease Control and Prevention (CDC) uses the social vulnerability index (SVI) to evaluate the impact of disasters on communities, weighting the damage with social factors in the states of Washington and Idaho.

statistics

Description

Topic:

The Center for Disease Control and Prevention (CDC) uses the social vulnerability index (SVI) to evaluate the impact of disasters on communities, weighting the damage with social factors in the states of Washington and Idaho. 

Problem:

The data consolidated by the CDC is used to determine the most vulnerable areas should a disaster occur. In a perfect world, the indicators of vulnerability would represent the people correctly. Currently, this far-from-perfect method is the best that has been developed. There may be indicators that are not adequately predictive of social vulnerability.

 

Question 1: What relationships exist in the states of Washington and Idaho between the socioeconomic indicators, household, and composition indicators, disability indicators, and social vulnerability when using the data consolidated by the CDC (2018a)?

 

Question 2:  What indicators in the states of Washington and Idaho between the socioeconomic indicators, household, and composition indicators, disability indicators have the most influence in predicting social vulnerability when using the data consolidated by the CDC (2018a)?

Data:  

      The data and data dictionaries are online. 

o   Center for Disease Control and Prevention. (2018a). Social vulnerability index [data set]. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/CSV/SVI2018_US.csv

o   Center for Disease Control and Prevention. (2018b). Social vulnerability index [code book]. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/SVI2018Documentation.pdf

o   Note: The raw data must be this report in its original form when it enters the R script file. Use the data dictionary to understand the data.

      Create a subset of the data to represent the sample of secondary data in this analysis.


                  o The SVI index’s variable name is                               o

RPL_THEMES, in column 99 o Socioeconomic

      Persons below the poverty estimate

  Civilian unemployed estimate

  Per capita income estimate

  Persons with no high school    o diploma

Household and composition disability features

  Ages 65 and older

  Ages 17 and under

  Persons with a disability, over the age of 5

  Single-parent households

The state field


Note: Do not use more than one indicator for each measure defined in this section.

Variable names preceded with “E_” are actual measures, while “M_” represents the margin of error estimates. 

Other prefixes are follow-on calculations or qualitative information, do not include variables that are not identified in the research questions, as listed in the data section.

Do not include the margin of error estimates at this time. 

Considering the research questions, after subsetting, there will be 10 variables used in this analysis.

 

 

Data Cleaning:

      Do not remove missing values during cleaning. If missing values need to be removed for analysis method, do it during the preparation for analysis. A code represents missing values. Use the data dictionary to understand the data sample and how missing values are represented.

      When changing an object or part of an object, validate the change that occurred as expected.   

      The steps that are taken in cleaning are not discussed in the research paper.  

      There is a code that represents missing values; ensure this is found in the data dictionary! These values will have to be recoded as NA.

Analyze: 

      Conduct two types of analysis: visual analysis to identify relationships and a random forest model to identify influential indicators in predicting the social vulnerability. 

      The sub-stages of Analyze are necessary at least two times; profile, prepare, and apply. This method is for programming, not documenting research. 

      During the visual analysis, only present meaningful visuals to understand what the relationships exist between the indicators for the social vulnerability index. 

      Ensure you establish that the model is valid and reliable before discussing the influential indicators.

      Also, create a random forest model for each state that is assigned. Ensure that this analysis is within the scope of the research. 

Documenting research:

Results, Impact of the Results:

      Ensure that assertions and assessments in the results and discussion sections are derived from the analysis in R. 

      Do not speculate. Use evidence. When documenting the results, consider the generalizability.

Future Recommendations:

      Include recommendations for future analysis, based on the research in R. 

      An example might look something like this:

o   An opportunity for further research, based on gaps found in the random forest modeling, is to look at the ability to tune the parameters further, to improve the performance in predicting the   

o   Additionally, an opportunity for future research is exploration modeling to determine what other variables, when eliminated, have little or no impact on the ability to predict the SVI based on the supporting characteristics in the data.

Please provide code with comments.


Related Questions in statistics category