IMDB provides a list of celebrities born on the current date. Below is the link: http://m.imdb.com/feature/bornondate
Get the list of these celebrities from this webpage using web scraping (the ones that are displayed i.e top 10). You have to extract the below information:
- Name of the celebrity
- Celebrity Image
- Profession
- Best Work
Once you have this list, run a sentiment analysis on twitter for each celebrity and finally the output should be in the below format
- Name of the celebrity:
- Celebrity Image:
- Profession:
- Best Work:
- Overall Sentiment on Twitter: Positive, Negative or Neutral
-
Beautifulsoup4 - Python library for pulling data out of HTML and XML files.
-
Tweepy - OpenSource Twitter API for Python.
-
Selenium - The webdriver kit emulates a web-browser and executes JavaScripts to load the dynamic content.
-
Textblob - Python library using nltk to find polarity of text/tweet.
-
lxml - A fast html and xml parser for beautifulsoup4
-
Mozilla Firefox - Web Browser to perform web scraping.
-
Gecko Driver - Driver for Selenium to invoke Firefox.
-
API Keys for Twitter has to be put in
/data/twitter_api_keys.json
(Refer sample_twitter_api_keys.json for format.)
-
Make sure you have all the requirements installed. See requirements.txt or run
pip install -r requirements.txt --upgrade
-
Make sure you have the latest version of Mozilla Firefox installed and latest version of geckodriver in utils folder.
Run the application using:
python App.py
Maneesh D - maneeshd77@gmail.com