Using industry-standard NLP libraries SpaCy, NLTK, and scikit-learn, this study will examine the key words in a post title that most positively affect user engagement. The exploratory data analysis and visualizations in the following notebook will also factor in other features of the supplied data, including author, post time, and date. For the purposes of this study, positive user engagement will be measured in upvotes.
- world_news_posts.csv: Supplied dataframe with roughly 500,000 titles of posts on a "world news" message board, including data for the date, time, and author of the post, along with user interaction.
- world_news_posts_az.csv: Cleaned version of the original world_news_posts dataframe with additional engineered features.
Feature | Type | Dataset | Description |
---|