Breast Cancer Classification Using Machine Learning with Streamlit

Please click the Streamlit Link here to interact with the App.

Description

This project focuses on the classification of breast tumors as either Malignant or Benign using various machine-learning algorithms such as KNN, Logistic Regression, and Random Forest. Leveraging a dataset containing comprehensive tumor metrics such as radius, texture, perimeter, area, smoothness, and more, we aim to build robust models for accurate tumor classification.

Dataset

Breast Cancer Wisconsin (Diagnostic) Data Set

The dataset comprises tumor characteristics such as radius, texture, perimeter, and other measurable features extracted from breast cancer images. These attributes serve as crucial inputs for training and testing the classification models.

Final project for the WQD7003 | Data Analytics course at the University of Malaya

Team members:

Huang Han Wang
Shuangdan Ni
Xinrui Guo
Ding Jie
Naxin Dong

Instructor: Dr. Hema Subramaniam

Project objectives

To perform preprocessing and exploratory analysis on the dataset.
To build the classification model to judge whether the tumor type is malignant or benign.
To compare the advantages and disadvantages of different classification algorithms based on evaluation metrics.
To fine-tune and optimize the hyperparameters and improve the accuracy, stability, and execution speed of the tumor-type judgment model.

Infographics of this Project

Streamlit Breast Cancer Detection App Explanation (Link)

Important Note: Please refer to this file breast_cancer_app.py.

Section 1: Importing Libraries

The initial part involves importing the necessary libraries and modules required for the application. These include libraries for data manipulation (pandas, numpy), visualization (seaborn, matplotlib, plotly), machine learning models (sklearn), and the Streamlit framework (streamlit).

Section 2: Define Main Function

The main() function sets up the Streamlit app's layout, title, and initial configurations like page icon, title, and sidebar title.

Section 3: Data Loading and Processing Functions

load_data(): Loads the Breast Cancer dataset using load_breast_cancer() from sklearn, processes it into a DataFrame, and performs label encoding.
split(df): Splits the dataset into training and testing sets using train_test_split() from sklearn.

Section 4: Data Analysis Section

This part handles the visualizations and displays for the data analysis section of the app. It includes:

Displaying raw data and features
Generating different types of plots based on user selection (scatter matrix, counts of malignant and benign cases, heatmap, scatter plots)

Section 5: Prediction Section

This section allows users to select different classifiers (Logistic Regression, Random Forest, KNN) and their hyperparameters, then displays metrics such as accuracy, precision, recall, and confusion matrices for the chosen classifier.

The workflow includes:

Choosing a classifier from the sidebar
Selecting hyperparameters for the chosen classifier
Displaying classification results and metrics based on user selections

Section 6: About Section

At the sidebar's bottom, information about the app creator and a reference to the app's GitHub repository are provided.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
images		images
README.md		README.md
breast_cancer_app.py		breast_cancer_app.py
requirements.txt		requirements.txt
tumor_icon.png		tumor_icon.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast Cancer Classification Using Machine Learning with Streamlit

Description

Dataset

Project objectives

Infographics of this Project

Streamlit Breast Cancer Detection App Explanation (Link)

Section 1: Importing Libraries

Section 2: Define Main Function

Section 3: Data Loading and Processing Functions

Section 4: Data Analysis Section

Section 5: Prediction Section

Section 6: About Section

About

Releases

Packages

Languages

WangHuangHan/Breast-Cancer-Classification-with-Streamlit

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Classification Using Machine Learning with Streamlit

Description

Dataset

Project objectives

Infographics of this Project

Streamlit Breast Cancer Detection App Explanation (Link)

Section 1: Importing Libraries

Section 2: Define Main Function

Section 3: Data Loading and Processing Functions

Section 4: Data Analysis Section

Section 5: Prediction Section

Section 6: About Section

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages