For this project, I combined three datasets with information about sellers, articles, and orders of one-month trade log, to answer some questions. My analytical process involved using Pandas for exploratory data analysis, NumPy for analysis of specific columns, and Matplotlib/Seaborn for visualization of results.
First, I scoped and collected the necessary data, followed by data exploration and preparation. Next, I defined the model and pipelines necessary to achieve the desired results. Using these tools, I was able to answer the questions and draw appropriate conclusions.
βοΈ articles.db: DB with articles data
βοΈ sellers.xlsx: Excel file with sellers data
βοΈ orders.csv: CSV file with sales records withing the month
βοΈ What is the best-selling item? (in units)
βοΈ Which item provided us with the highest revenue?
βοΈ Which seller should be awarded the βBest Seller of the Monthβ bonus?
βοΈ Are there significant variations in sales throughout the month?
βοΈ What were the top 5 countries in terms of purchases, and what was the total amount of their purchases?
βοΈ Notebooks or CPUs? Which did the top 5 purchasing countries buy more of?
βοΈ To begin with, I was tasked with collecting and organizing data from various sources, including CSV and Excel files, as well as a database. To accomplish this, I utilized a range of Python libraries, such as Pandas, SQLite3, and openpyxl. During the exploratory analysis, I assessed the data for various characteristics, such as the number of columns and entries, null values, data types, and unique indexes, and subsequently prepared the data for analysis. To proceed, I merged all the data frames into a single one for ease of use.
βοΈ In the analytical section, I was tasked with answering four questions using both analytical and graphical approaches. Additionally, I had to formulate three new questions and provide responses to them.
βοΈ Finally, I presented the conclusions and recommendations of the project in a clear and concise manner.
I used Python and its libraries Pandas, NumPy, and Matplotlib/Seaborn for data analysis and visualization
If you have any question, comments, or suggestions, do not hesitate to contact me (melisa.s.rossi@gmail.com).