This project consists in a dymanic site of recommendations of audio-visual cinematograph productions. My inspiration for this app is an answer for a kaggle task with a Netflix dataset of movies and tv show.
My first step was increase the database to others streaming services and sort the recommendations for average IMDB score of each title. I obtived this datas with similar datasets in kaggle.
To obtain the IMDBs scores I have to download the two datasets directly of the IMDBs page. The title.basics.tsv.gz file to get the titles's id and the title.ratings.tsv.gz to get the titles's scores.
Then I place this dataset on a google drive account and create a python notebook. In this notebook I build a database crossing this datasets files, clean the unnecessary datas and applied the tfidfvectorizer algorithm to create a coefficient of similarity between each title and all others titles. This algorithm uses pnl to create that coefficient. In this project I used the follow datas for feed the algorithm, the description, the parental guideline and the genre of the titles. Finaly I create a function that return a list of the ten most similar title sorted by IMBD score based on a title.
Then I resolved to upgrade this project adding the HBO max dataset and getting the latest datas, unfortunately I cannot find this dataset. So, I decid to scraping this datas. I find the flixable site that have the datas that I need.
After scraping the dates of each streaming service I cleaned and crossed the datasets to create one single dataset with all desirable options. The python notebook follow below.
Finally with the filtered datas, I created the web app with flask. The code is in this git repository. The url of the site follow below.