This is a repositiory for my final project in my UCLA Extension class, Introduction to Data Science. For this project I web scraped the FIVB Beach Volleyball schedule/results and rankings from 2001 - 2019. The goal of this project is to expand current data wrangling, EDA, and visualization skills, and to build a database of historical performances that can be used for future research.
The schedule data was web scraped from 753 tournament web pages from the old FIVB, the current FIVB and FIVB V.I.S. public pages.
The end of year ranking was taken on Oct. 1 at the end of each year from 2001 - 2019, using FIVB ranking pages.
Originally, I was going to use BigTimeStats dataset, however after investigating naming conventions, I found it difficult to add an end of the year ranking column matching player names. In the end I had to doctor the naming convention in order to match correct ranking and schedule player names. Below is an example of one row of the final dataset:
player | hour | team | team_country | tourn_rank | tourn | date | phase | year | match_no | court | time | team_rank | gender | opp | opp_team_rank | total_matches | match_won | match_lost | match_win | total_set_threes | total_sets_played | rank |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a-kim jihee" | 16 | "a-kim jihee/choi dan" | "KOR" | 149.47 | "w13avckr" | 2013-07-04 | "Pool A" | 2013 | 17 | 1 | "16:10" | 150 | "w" | "cavestro/tonon" | 260 | 1 | 0 | 1 | 0 | 0 | 2 | 150 |
File | Description |
---|---|
FINAL_PROJECT_CODE.R | Code used for the final portion used in R |
final_df.csv | Finalalized CSV file from webscrape of scheudle/results |
Report.pdf | Report about hypothesis |
Tech_Report.pdf | Report about code and EDA |
Ranking_scrape.py | The scrape code I used to pull the FIVB Beach rankings |
- R for final code and reports
- Python for web scrapping
- Excel/Google sheets for manual doctor edits of naming convetions
Tyler Widdison