GitHub - ShreyaKulkarnii/Machine-Learning-Project: The project focuses on identifying the speaker accent to be US or not US using binary classification. This project uses various Machine Learning classification methods like Logistic Regression, KNN, Binary Tree and Random Forests. Using the listed methods, evaluated the performance on the baseline models. To increase the accuracy and to prevent the dataset to be over-fitted or under-fitted various feature extraction and regularization techniques like Lasso and Ridge are used in this project. To increase the testing accuracy, fine-tuned the hyperparameters for the classification models.

ShreyaKulkarnii / Machine-Learning-Project Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

The project focuses on identifying the speaker accent to be US or not US using binary classification. This project uses various Machine Learning classification methods like Logistic Regression, KNN, Binary Tree and Random Forests. Using the listed methods, evaluated the performance on the baseline models. To increase the accuracy and to prevent …

0 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ProjectCode.ipynb		ProjectCode.ipynb
README.md		README.md
accent-mfcc-data-1.csv		accent-mfcc-data-1.csv

Repository files navigation

#PROJECT TITLE "Classification algorithm for Speaker Accent Recognition Data Set (2020)"

#Project Objective The purpose of this project is to classify the Us or Non_US accent from six different languages speakers using various classification algorithms.

#PROJECT DESCRIPTION In this project, we used supervised machine learning classification algorithms for training the model and evaluated the testing performance to find out the best classification model.
Model trained are:

Logistic Regression
K-Nearest Neighbors
Decision Tree
Random Forest

In this project we have performed the follow operations

Dataset Visualization
Dataset Cleaning
Feature Extraction
Model Development
Fine tuning
Performance Evaluation

For every algorithms, we have evaluated and compared the Accuracies, ROC-AUC and Precision . Depending on the testing accuracy we inferred that Random Forest classification algorithm was the highest (1) among all other classification algorithms used in this project.

#LIBRARIES USED Following library were imported from the Anaconda and used further in the project.

pyplot
SNS
Pands
numpy
Seaboard
Matplotlib
Sklearn

#GETTING STARTED

Import CSV file - "accent-mfcc-data-1.csv" from the project folder.
Read the CSV and store the dataset in variable "SAR_dt"
The whole ipynb file would run at once without any interruption.

#References

https://github.com/lakshanakolur/Accent-Recognition-ML/tree/master/Code

https://github.com/stephenjkaplan/speech-accent-classifier/blob/master/notebooks/Speech%20Accent%20Classifier%20MVP%20-%20American%20vs%20Non-American%20Accents.ipynb

https://github.com/stephenjkaplan/speech-accent-classifier/blob/master/notebooks/analysis_utilities.py

https://www.ritchieng.com/machine-learning-evaluate-classification-model/

https://towardsdatascience.com/logistic-regression-model-tuning-with-scikit-learn-part-1-425142e01af5

https://github.com/MadhavShashi/Human-Activity-Recognition-Using-Smartphones-Sensor-DataSet/blob/master/1.HumanActivityRecognition_EDA.ipynb

https://www.pluralsight.com/guides/cleaning-up-data-from-outliers

https://www.semanticscholar.org/paper/A-Comparison-of-Classifiers-in-Performing-Speaker-Ma-Fokoue/666a2cb9589c0d2b46cd91f89e3d470d85aa3e1d

https://www.sciencedirect.com/topics/computer-science/cepstral-coefficient#:~:text=In%20practice%2C%20the%20first%208,may%20be%20beneficial%20%5B130%5D.

About

The project focuses on identifying the speaker accent to be US or not US using binary classification. This project uses various Machine Learning classification methods like Logistic Regression, KNN, Binary Tree and Random Forests. Using the listed methods, evaluated the performance on the baseline models. To increase the accuracy and to prevent …

machine-learning random-forest logistic-regression data-cleaning knn-classification fine-tuning overfitting decesion-trees

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%