This project is data analysis solution designed to extract critical business insights from Walmart sales data.Utilize Python for data processing and analysis, SQL for advanced querying, and structured problem-solving techniques to solve key business questions.
- Tools Used: Visual Studio Code (VS Code), Python(
pandas
,numpy
,sqlalchemy
,mysql-connector-python
,psycopg2
), SQL (MySQL and PostgreSQL), Jupyter Notebook - Data Source: Kaggle’s Walmart Sales Dataset
- Goal: Conduct an initial data exploration to understand data distribution, check column names, types, and identify potential issues.
- Analysis: Use functions like
.info()
,.describe()
, and.head()
to get a quick overview of the data structure and statistics.
- Remove Duplicates: Identify and remove duplicate entries to avoid skewed results.
- Handle Missing Values: Drop rows or columns with missing values if they are insignificant; fill values where essential.
- Fix Data Types: Ensure all columns have consistent data types (e.g., dates as
datetime
, prices asfloat
). - Currency Formatting: Use
.replace()
to handle and format currency values for analysis. - Validation: Check for any remaining inconsistencies and verify the cleaned data.
- Create New Columns: Calculate the
Total Amount
for each transaction by multiplyingunit_price
byquantity
and adding this as a new column. - Enhance Dataset: Adding this calculated field will streamline further SQL analysis and aggregation tasks.
- Business Problem-Solving: Write and execute complex SQL queries to answer critical business questions, such as:
- Revenue trends across branches and categories.
- Identifying best-selling product categories.
- Sales performance by time, city, and payment method.
- Analyzing peak sales periods and customer buying patterns.
- Profit margin analysis by branch and category.
- Documentation: Keep clear notes of each query's objective, approach, and results.
This section will include analysis findings:
- Sales Insights: Key categories, branches with highest sales, and preferred payment methods.
- Profitability: Insights into the most profitable product categories and locations.
- Customer Behavior: Trends in ratings, payment preferences, and peak shopping hours.