Spotify Playlist Analyzer

This repository contains the project for the Big Data course.

Jobs

It contains 2 different jobs, both implemented in two version, one non-optimized and one optimized.

The first job calculates the average number of TracksForArtist in all the playlists.
The second one, given a specific song, calculates the most similar track related to that, (the song that appears more time in the same playlists of the target one), and the number of playlists that they share.

The results of the jobs, both the optimized and the non-optimized version, are stored in the repost/results folder.

It is also possible to visualize the stats of the jobs in the report/stats folder.

Download dataset:

#!/bin/bash
curl -L -o ~/Downloads/spotify-millions-playlist.zip\ https://www.kaggle.com/api/v1/datasets/download/adityak80/spotify-millions-playlist

Additional readmes:

Initial Setup (instructions to setup the environment)
AWS CLI cheatsheet (a collection of the most common commands to use on the AWS CLI)
AWS Workflow (a vademecum of the list of things to do to setup the AWS environment and use it to deploy Spark jobs)
Exam Project (instructions for the project, that is mandatory for the exam)

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
config		config
gradle/wrapper		gradle/wrapper
readmes		readmes
report		report
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts