Skip to content

Latest commit

 

History

History
49 lines (30 loc) · 3.75 KB

File metadata and controls

49 lines (30 loc) · 3.75 KB

BaseBall EDA Case Study Analysis

Exploratory Data Analysis Project

This project conducts exploratory data analysis (EDA) on a dataset of career statistics for major league baseball players. The goal is to understand the relationships between different performance metrics and player salary. The Rmarkdown html code file can be found here

The analysis focuses on key variables like AtBat, Hits, and Salary. After cleaning the data and dealing with missing values, univariate analysis is performed to understand the distribution of individual variables. Bivariate analysis explores the relationships between variables through visualizations like scatterplots, boxplots, and spread-level plots.

Some key steps include:

-- Handling missing data Screenshot 2023-12-10 at 4 34 00 PM

Screenshot 2023-12-10 at 4 34 32 PM

-- Exploring distributions of key variables

Screenshot 2023-12-10 at 4 34 54 PM Screenshot 2023-12-10 at 4 35 18 PM Screenshot 2023-12-10 at 4 35 36 PM

-- Checking for outliers Screenshot 2023-12-10 at 4 35 47 PM

-- Using transformations to make distributions more symmetric Screenshot 2023-12-10 at 4 36 04 PM Screenshot 2023-12-10 at 4 36 14 PM

-- Fitting resistant models to understand relationships Screenshot 2023-12-10 at 4 36 43 PM Screenshot 2023-12-10 at 4 36 53 PM

-- Binning data and constructing rootograms Screenshot 2023-12-10 at 4 37 12 PM Screenshot 2023-12-10 at 4 37 25 PM

Screenshot 2023-12-10 at 4 37 37 PM

The analysis provides insights like:

-- Salary has a positive correlation with AtBat

-- Hits are right skewed while AtBat is left skewed

-- Power transformations improve symmetry

-- Roots of Hits/AtBat deviate from normality

Overall, this project demonstrates core EDA concepts and workflows that can be applied to any dataset. The methods help uncover insights and inform future modeling