big_data_analysis_hadoop_stack

Full fledged data analysis project using Hadoop stack

Steps performed in the project:

Acquire the top 200,000 posts by viewcount
Using Pig or MapReduce , extract, transform and load the data as applicable
Using Hive Query Language , compute: I. The top 10 posts by score II. The top 10 users by post score III. The number of distinct users, who used the word “Hadoop” in one of their posts
Using Mapreduce calculate the per user TF IDF and find 10 most used words, excluding stop words.

Refer to "Documentation" for step by step guide.