GitHub

Exploratory Data Analysis on a heart disease dataset

This project is a EDA on heart disease dataset, the dataset is located in the data folder. The data contains information on heart health indicators, including demographic, clinical, and lifestyle variables.

The EDA is done in a jupyter notebook called eda.ipynb

Task 1

Data Loading and Inspection

In this task, the python libraries were imported as alias. The data was inspected with the pandas.

Task 2

Handle Misrepresented Data in columns

Found the ? in the ca column, replace it the mode of the ca column and casted the data type to a float Found the ? in the thal column, replace it the mode of the thal column and casted the data type to a float

Duplicates

There were no duplicates in the data

Task 3

In Task 3, the heart_disease column was edited to be in 0 and 1 Where 0 = no heart disease and 1 = heart disease

This will make heart_disease column easy to analyze and draw conclusions.

Task 4

The conclusion drawn from this correlation matrix is that, there are low correlation between the age, sex and the heart disease. But there are slightly higher correlation between cp, exang, oldpeak, ca and thal to the heart disease on this a conclusion can be drawn that heart disease is more dependent on the lifestyle instead of the age and sex.

Pandas CrossTab

The conclusion draw from the crosstab shown that there is a chance of a male getting a heart disease than a female

Task 5

In this pair plot, we are looking for patterns between the two color groups. Looking at the density plots along the diagonal, there are no features that cleanly separate the groups (age has the most separation). However, looking at the scatterplot for age and thalach (maximum heart rate from an exercise test), there is more clear separation. It appears that patients who are old and have low thalach are more likely to be diagnosed with heart disease than patients who are young and have high thalach. This suggests that we want to make sure both of these features are included in our model.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
Health_EDA.docx		Health_EDA.docx
Health_EDA.pdf		Health_EDA.pdf
README.md		README.md
eda.ipynb		eda.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploratory Data Analysis on a heart disease dataset

Task 1

Data Loading and Inspection

Task 2

Handle Misrepresented Data in columns

Duplicates

Task 3

Task 4

Pandas CrossTab

Task 5

About

Releases

Packages

Contributors 2

Languages

amoako420/lab7

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis on a heart disease dataset

Task 1

Data Loading and Inspection

Task 2

Handle Misrepresented Data in columns

Duplicates

Task 3

Task 4

Pandas CrossTab

Task 5

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages