Skip to content

zucchero-sintattico/spotify-playlist-analyzer

Repository files navigation

Spotify Playlist Analyzer

This repository contains the project for the Big Data course.

Jobs

It contains 2 different jobs, both implemented in two version, one non-optimized and one optimized.

  • The first job calculates the average number of TracksForArtist in all the playlists.
  • The second one, given a specific song, calculates the most similar track related to that, (the song that appears more time in the same playlists of the target one), and the number of playlists that they share.

The results of the jobs, both the optimized and the non-optimized version, are stored in the repost/results folder.

It is also possible to visualize the stats of the jobs in the report/stats folder.

Dataset

Download dataset:

#!/bin/bash
curl -L -o ~/Downloads/spotify-millions-playlist.zip\ https://www.kaggle.com/api/v1/datasets/download/adityak80/spotify-millions-playlist

READMEs

Additional readmes:

  • Initial Setup (instructions to setup the environment)
  • AWS CLI cheatsheet (a collection of the most common commands to use on the AWS CLI)
  • AWS Workflow (a vademecum of the list of things to do to setup the AWS environment and use it to deploy Spark jobs)
  • Exam Project (instructions for the project, that is mandatory for the exam)

About

This repository contains our project for the BigData course.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published