This project is conducted as part of the Probability Theory and Statistics course at AGH University of Kraków. It aims to analyze Formula 1 seasons from 2019 to 2024 using statistical methods. The project is implemented in a Jupyter Notebook using Python and employs various libraries for data collection, processing, visualization, and statistical analysis.
Note: The project and its documentation are written in Polish.
The project is written in Python and utilizes the following libraries:
import sqlite3
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import fastf1 as ff1
import fastf1.plotting
import statsmodels
from scipy import stats
import requests
- fastf1: For fetching and analyzing Formula 1 timing data.
- requests: To collect data from Ergast API.
- sqlite3: To save collected data.
- pandas, numpy: For data manipulation and statistical computation.
- matplotlib, seaborn: For visualization.
- statsmodels, scipy: For statistical tests and regression analysis.
The project is structured into the following sections:
- Introduction – Overview of the project and objectives.
- Data Gathering and Cleaning – Collecting data using Ergast API and processing it.
- Visual Analysis – Creating graphical representations of key insights.
- Exploratory Analysis – Examining statistical properties and distributions.
- Statistical Tests and Regression Analysis – Applying hypothesis tests and regression models.
- Conclusion – Summarizing findings and potential applications.
The project provides two export files:
- With code blocks: Includes all the implemented Python code.
- Without code blocks: A clean export with analysis and results, excluding code.
The project uses the Ergast API and fastf1 python library to retrieve F1 race data for statistical analysis.
This project is for academic purposes and follows an open-source license. Contributions and feedback are welcome!