Twitter Presidential Race Sentiment Clustering

Last Updated: December 4, 2016

Lead Maintainers: Rafael Zamora, Justin Murphy

Overview

The goal of this project is to analyze Twitter data related to the 2016 United States presidential race. We hope to discover classes of tweets by applying clustering techniques on two particular data points: the sentiment of tweets and how much they reference either the Republican or Democrat candidates. The clusters will then be used to analyze tweeting behavior over the last few weeks of the election. We hope to see how specific events during the race influenced Twitter's sentiment towards either candidate.

Data was gathered from 3 weeks prior to the election and 1 week after the election.The data was pulled from Twitter using Python with the following parameters:

Start Date: 2016-10-16
End Date: 2016-11-14
Keywords: @hillaryclinton OR #hillaryclinton OR Hillary Clinton OR Hillary OR @RealDonaldTrump OR #donaldtrump OR Donald Trump OR Trump

The following values were gathered from each tweet:

Author-ID
Date with Time
Text

The following is an example of a tweet and the values produced through processing:

Tweet text: Donald Trump Angry at Mike Pence For Doing Great Job At Vice Presidential Debate.
Noun Phrases: [ 'donald trump angry', 'mike pence', 'job', 'vice presidential debate' ]
Sentiment Value: 0.803
Clinton Reference Value: 0.263
Trump Reference Value: 0.800
Candidate Reference Value: -0.537*

*-1 = Trump, 1 = Clinton

SciKitLearn's Birch Clustering algorithm was used to cluster the processed data. The following are graph examples of processed and clustered data:

Getting Started

Requirements:

Requires Python 3.5 and R.

Requires the following Python Packages:

GOT3 (modified version is included in /src/)
TextBlob
scikit-learn

Setup and Installation:

To install download or clone repository and install required packages.

The /src/ folder includes all scripts used for this project. The following are short descriptions of each script:

PullTwitterData.py - Used to pull and write data to CSV
ProcessTwitterData.py - Used to process and run sentiment analysis on pulled data
ClusterTwitterData.py - Used to run Birch clustering on processed data
GraphTwitterData.R - Used to export PNG graphs of processed and clustered data

The /doc/ folder contains an R Notebook used for analyzing data and results. It also contains /figures/ folder which includes graphs of all processed and clustered data.

The /data/ folder contains pre-processed and processed Twitter data while the final clustered data can be found in the results folder.

Cluster sizes and centroid coordinates can be found in results.txt

Graph of the total number of Tweets per day can be found in TweetsPerDay.png

License

This project is licensed under the MIT License.

Citation

CITATION provides how to cite this project.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
doc		doc
results		results
src		src
CITATION		CITATION
LICENSE		LICENSE
README.md		README.md
TODO		TODO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Presidential Race Sentiment Clustering

Overview

Getting Started

Requirements:

Setup and Installation:

License

Citation

About

Releases

Packages

Languages

License

rz4/Twitter-Presidential-Race-Sentiment-Clustering

Folders and files

Latest commit

History

Repository files navigation

Twitter Presidential Race Sentiment Clustering

Overview

Getting Started

Requirements:

Setup and Installation:

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages