The-Art-of-Analyzing-Big-Data

This repository includes all tasks' notebooks, final project, and final exam work created as part of the Ben-Gurion University of the Negev course "The Art of Analyzing Big Data". The course covers techniques for mining massive datasets. In this course, we learned how to perform common tasks, such as classification, clustering analysis, and analysis of large datasets. Each of the tasks tackles different machine learning and big data issues. Course web site.

Final Exam

In the notebook, I tackled four different tasks.

Covid-19 data analysis.

In this question, we were asked to find a way to describe the spread of the coronavirus over a specific country, chosen from a list of ~200 countries. My chosen country is South Africa. We were asked to show the affect of the 4th/5th COVID wave on three disciplines of our choice (education system, public transportation, health system, employment, etc.) in compare to the 1st/2nd COVID wave.

Communities Identification

In this question, we were told to use Kaggle Meta dataset to build a social network of the users. Then, I applied networkx's communities identification algorithm.

### Forums Topic Models Using forums posts data, I had to find the topics discussed each year. I did it by building a topic model for each year (nltk, turicreate), and then displayed it on a worldcloud.

### Olympic Medals analysis - In this question, we were asked to propose a method for finding remarkable performances of olympic athletes, and a way to detect surprising results on the olympic games. I performed clustering in order to achive the question's requirements. Clustering enables us to find the gold, silver or bronze medal athletes, that should be spatialy closed to each other in the matching cluster-centers. It can also help us to identify extraordinary performance of an athlete- the sample of this athlete will be very far from the clusters centers. The pipeline contains PCA for dimensionality reduction, and k-means using the PC copmuted components.

Home Assignments

Assignment 1 - DB, SQL, various datasets, sqlite3 package.

Assignment 2 - Scraping with beautiful soup, working with API's and pandas, networkx.

Assignment 3 - Data visualization using turicreate, pandas and seaborn.

Assignment 4 - Working with graphs.

Assignment 5 - Link predictions and graph analysis.

Assignment 6 - NLP and Sentiment analysis and classification.

Assignment 7 - NLP, entity extraction, networks and visualization.

Assignment 8 - Geopandas, plotly express.

Assignment 9 - Extracting Data from Images and Sounds - working with pySpark, classifiers map visualization and more.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
supplementary		supplementary
HW_1.ipynb		HW_1.ipynb
HW_2.ipynb		HW_2.ipynb
HW_3.ipynb		HW_3.ipynb
HW_4.ipynb		HW_4.ipynb
HW_5.ipynb		HW_5.ipynb
HW_6.ipynb		HW_6.ipynb
HW_7.ipynb		HW_7.ipynb
HW_8.ipynb		HW_8.ipynb
HW_9_.ipynb		HW_9_.ipynb
README.md		README.md
The_Airbnb_housing_market.ipynb		The_Airbnb_housing_market.ipynb
The_Art_of_Analyzing_Big_Data_Take_Home_Exam.ipynb		The_Art_of_Analyzing_Big_Data_Take_Home_Exam.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The-Art-of-Analyzing-Big-Data

Final Exam

Covid-19 data analysis.

Communities Identification

Home Assignments

About

Releases

Packages

Languages

amitshakarchy/The-Art-of-Analyzing-Big-Data

Folders and files

Latest commit

History

Repository files navigation

The-Art-of-Analyzing-Big-Data

Final Exam

Covid-19 data analysis.

Communities Identification

Home Assignments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages