5. Some variables have most of the same number, So if you have the same value as for 97% or more, then don’t include that variable.
6. delete the value of -99,99,88,-88, null . Not entire row, just value. But, the professor said, in some cases, it shouldn't be deleted. For example, some variables have values like 77, 76, 55,66, 88, 99... in here 88 and 99 shouldn't be deleted. But if in the categorical variable, it should be deleted. Something like that... There's maybe more additional cases...
All above is the data cleaning. this part is important.
8. In each univariate logistic regression, you delete the row for y and the variable.
9. do the dummy variable separately when doing logistic regression
choose only significant variable ones
10. An additional part, after chooses the most significant variable, do the multiple logistic regression, to choose the best model.
Please don't use too complicated algorithm, and add explanation why you use this function and method.
Get Free Quote!
344 Experts Online