The major aim of this project is to gain insight into the sales data of Amazon to understand the different factors that affect sales of the different branches.
This dataset contains sales transactions from three different branches of Amazon, respectively located in Mandalay, Yangon, and Naypyitaw. The data contains 17 columns and 1000 rows:
Column | Description | Data Type |
---|---|---|
invoice_id | Invoice of the sales made | VARCHAR(30) |
branch | Branch at which sales were made | VARCHAR(5) |
city | The location of the branch | VARCHAR(30) |
customer_type | The type of the customer | VARCHAR(30) |
gender | Gender of the customer making purchase | VARCHAR(10) |
product_line | Product line of the product sold | VARCHAR(100) |
unit_price | The price of each product | DECIMAL(10, 2) |
quantity | The amount of the product sold | INT |
VAT | The amount of tax on the purchase | FLOAT(6, 4) |
total | The total cost of the purchase | DECIMAL(10, 2) |
date | The date on which the purchase was made | DATE |
time | The time at which the purchase was made | TIMESTAMP |
payment_method | The total amount paid | DECIMAL(10, 2) |
cogs | Cost Of Goods sold | DECIMAL(10, 2) |
gross_margin_percentage | Gross margin percentage | FLOAT(11, 9) |
gross_income | Gross Income | DECIMAL(10, 2) |
rating | Rating | FLOAT(2, 1) |
Conduct analysis on the data to understand the different product lines, the product lines performing best, and the product lines that need to be improved.
This analysis aims to answer the question of the sales trends of products. The result of this can help us measure the effectiveness of each sales strategy the business applies and what modifications are needed to gain more sales.
This analysis aims to uncover the different customer segments, purchase trends, and the profitability of each customer segment.
This is the first step where inspection of data is done to make sure NULL values and missing values are detected and data replacement methods are used to replace missing or NULL values.
- Build a database.
- Create a table and insert the data.
- Select columns with null values in them. There are no null values in our database as in creating the tables, we set NOT NULL for each field, hence null values are filtered out.
This will help us generate some new columns from existing ones.
- Add a new column named
timeofday
to give insight into sales in the Morning, Afternoon, and Evening. This will help answer the question on which part of the day most sales are made. - Add a new column named
dayname
that contains the extracted days of the week on which the given transaction took place (Mon, Tue, Wed, Thu, Fri). This will help answer the question on which day of the week each branch is busiest. - Add a new column named
monthname
that contains the extracted months of the year on which the given transaction took place (Jan, Feb, Mar). This will help determine which month of the year has the most sales and profit.
Exploratory data analysis is done to answer the listed questions and aims of this project.
- What is the count of distinct cities in the dataset?
- For each branch, what is the corresponding city?
- What is the count of distinct product lines in the dataset?
- Which payment method occurs most frequently?
- Which product line has the highest sales?
- How much revenue is generated each month?
- In which month did the cost of goods sold reach its peak?
- Which product line generated the highest revenue?
- In which city was the highest revenue recorded?
- Which product line incurred the highest Value Added Tax?
- For each product line, add a column indicating "Good" if its sales are above average, otherwise "Bad."
- Identify the branch that exceeded the average number of products sold.
- Which product line is most frequently associated with each gender?
- Calculate the average rating for each product line.
- Count the sales occurrences for each time of day on every weekday.
- Identify the customer type contributing the highest revenue.
- Determine the city with the highest VAT percentage.
- Identify the customer type with the highest VAT payments.
- What is the count of distinct customer types in the dataset?
- What is the count of distinct payment methods in the dataset?
- Which customer type occurs most frequently?
- Identify the customer type with the highest purchase frequency.
- Determine the predominant gender among customers.
- Examine the distribution of genders within each branch.
- Identify the time of day when customers provide the most ratings.
- Determine the time of day with the highest customer ratings for each branch.
- Identify the day of the week with the highest average ratings.
- Determine the day of the week with the highest average ratings for each branch.