Video games reviews are domains of the abstruse—especially when non-professional, user reviews are considered. Reviewers written by your average gamer may be short, pithy, and full of jokes and memes for example. Reviews may also be lengthy and full of information. Steam is a digital gaming store and social media service for P.C. gamers. Gamers often post reviews of games they like or dislike on Steam. Reviews on Steam may be voted "funny" or "helpful" by other gamers. These reviews are an important source for developers as they can read exactly what gamers are saying about their work as well as the work of their competitors. Of course, this is a perfect application for machine learning.
My project is a Natural Language Processing (NLP) classification and (tentative) topic analysis for Steam reviews. Steam data are available via an A.P.I. which I shall query via an open source program I'm writing in Rust. My preprocessing and models use several common Python libraries including pandas for DataFrames, scikit-learn for machine learning, Keras for neural networks and spaCy for NLP. My project is both fun and practical. Is the review "PEWPEWPEWPEW" (actual review) positive or negative? Can a classifier figure that out? Topic analysis may be able to pull out the common negative or positive trends for game reviews. The crux of the project is careful preprocessing in a pipeline via spaCy which facilitates modelling.