The dataset represents 10 years (1999-2008) of clinical care data from 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters t

data mining

Description

Introduction

The dataset represents 10 years (1999-2008) of clinical care data from 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria.


(1) It is an inpatient encounter (a hospital admission).
(2) It is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis.
(3) The length of stay was at least 1 day and at most 14 days.
(4) Laboratory tests were performed during the encounter.
(5) Medications were administered during the encounter.

            Dataset

The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab test performed, HbA1c test result, diagnosis, number of medication, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc.

Feature name

Type

Description and values

Encounter ID

Numeric

Unique identifier of an encounter

Patient number

Numeric

Unique identifier of a patient

Race

Nominal

Values: Caucasian, Asian, African American, Hispanic, and other

Gender

Nominal

Values: male, female, and unknown/invalid

Age

Nominal

Grouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100)

Weight

Numeric

Weight in pounds.

Admission type

Nominal

Integer identifier corresponding to 9 distinct values, for example, emergency, urgent, elective, newborn, and not available

Discharge disposition

Nominal

Integer identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available

Admission source

Nominal

Integer identifier corresponding to 21 distinct values, for example, physician referral, emergency room, and transfer from a hospital

Time in hospital

Numeric

Integer number of days between admission and discharge

Payer code

Nominal

Integer identifier corresponding to 23 distinct values, for example, Blue Cross/Blue Shield, Medicare, and self-pay

Medical specialty

Nominal

Integer identifier of a specialty of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon

Number of lab procedures

Numeric

Number of lab tests performed during the encounter

Number of procedures

Numeric

Number of procedures (other than lab tests) performed during the encounter

Number of medications

Numeric

Number of distinct generic names administered during the encounter

Number of outpatient visits

Numeric

Number of outpatient visits of the patient in the year preceding the encounter

Number of emergency visits

Numeric

Number of emergency visits of the patient in the year preceding the encounter

Number of inpatient visits

Numeric

Number of inpatient visits of the patient in the year preceding the encounter

Diagnosis 1

Nominal

The primary diagnosis (coded as first three digits of ICD9); 848 distinct values

Diagnosis 2

Nominal

Secondary diagnosis (coded as first three digits of ICD9); 923 distinct values

Diagnosis 3

Nominal

Additional secondary diagnosis (coded as first three digits of ICD9); 954 distinct values

Number of diagnoses

Numeric

Number of diagnoses entered to the system

Glucose serum test result

Nominal

Indicates the range of the result or if the test was not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured

A1c test result

Nominal

Indicates the range of the result or if the test was not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured.

Change of medications

Nominal

Indicates if there was a change in diabetic medications (either dosage or generic name). Values: “change” and “no change”

Diabetes medications

Nominal

Indicates if there was any diabetic medication prescribed. Values: “yes” and “no”

24 features for medications

Nominal

For the generic names: metformin, repaglinide, nateglinide, chlorpropamide, glimepiride, acetohexamide, glipizide, glyburide, tolbutamide, pioglitazone, rosiglitazone, acarbose, miglitol, troglitazone, tolazamide, examide, sitagliptin, insulin, glyburide-metformin, glipizide-metformin, glimepiride-pioglitazone, metformin-rosiglitazone, and metformin-pioglitazone, the feature indicates whether the drug was prescribed or there was a change in the dosage. Values: “up” if the dosage was increased during the encounter, “down” if the dosage was decreased, “steady” if the dosage did not change, and “no” if the drug was not prescribed

Readmitted

Nominal

Days to inpatient readmission. Values: “<30” if the patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmission.

Task I: Exploratory data analysis

Questions: Using rapidminer, answer the following questions (provide screenshots)

 

1.       How many cases of non-readmitted are in the dataset?

2.       How many cases of readmitted within 30 days are in the dataset?

3.       How many cases of readmissions are in the dataset (regardless as to whether it is within 30 days or longer)?

4.       which age group has the highest ratio of readmitted/total (readmitted + non readmitted).

5.       Is there an age group that had no patients being readmitted?


Related Questions in data mining category