Skip to content

CS498E: Sentiment Clustering Analysis of 2016 Presidential Race Twitter Data

License

Notifications You must be signed in to change notification settings

rz4/Twitter-Presidential-Race-Sentiment-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Election Img

Twitter Presidential Race Sentiment Clustering

Current Version

Last Updated: December 4, 2016

Lead Maintainers: Rafael Zamora, Justin Murphy

Overview

The goal of this project is to analyze Twitter data related to the 2016 United States presidential race. We hope to discover classes of tweets by applying clustering techniques on two particular data points: the sentiment of tweets and how much they reference either the Republican or Democrat candidates. The clusters will then be used to analyze tweeting behavior over the last few weeks of the election. We hope to see how specific events during the race influenced Twitter's sentiment towards either candidate.

Data was gathered from 3 weeks prior to the election and 1 week after the election.The data was pulled from Twitter using Python with the following parameters:

  • Start Date: 2016-10-16
  • End Date: 2016-11-14
  • Keywords: @hillaryclinton OR #hillaryclinton OR Hillary Clinton OR Hillary OR @RealDonaldTrump OR #donaldtrump OR Donald Trump OR Trump

The following values were gathered from each tweet:

  • Author-ID
  • Date with Time
  • Text

The following is an example of a tweet and the values produced through processing:

  • Tweet text: Donald Trump Angry at Mike Pence For Doing Great Job At Vice Presidential Debate.
  • Noun Phrases: [ 'donald trump angry', 'mike pence', 'job', 'vice presidential debate' ]
  • Sentiment Value: 0.803
  • Clinton Reference Value: 0.263
  • Trump Reference Value: 0.800
  • Candidate Reference Value: -0.537*

*-1 = Trump, 1 = Clinton

SciKitLearn's Birch Clustering algorithm was used to cluster the processed data. The following are graph examples of processed and clustered data:

Processed Data Graph Example

Clustered Data Graph Example

Getting Started

Requirements:

Requires Python 3.5 and R.

Requires the following Python Packages:

Setup and Installation:

To install download or clone repository and install required packages.

The /src/ folder includes all scripts used for this project. The following are short descriptions of each script:

The /doc/ folder contains an R Notebook used for analyzing data and results. It also contains /figures/ folder which includes graphs of all processed and clustered data.

The /data/ folder contains pre-processed and processed Twitter data while the final clustered data can be found in the results folder.

Cluster sizes and centroid coordinates can be found in results.txt

Graph of the total number of Tweets per day can be found in TweetsPerDay.png

License

This project is licensed under the MIT License.

Citation

CITATION provides how to cite this project.

About

CS498E: Sentiment Clustering Analysis of 2016 Presidential Race Twitter Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published