Skip to content

Latest commit

 

History

History
40 lines (36 loc) · 1.64 KB

README.md

File metadata and controls

40 lines (36 loc) · 1.64 KB

Glassdoor Review (an ongoing project)

Scripts and notebooks

  • script: read_large_dta.py: reads the original 16GB STATA data file, and randomly select a representative subsample to conduct text analysis
  • script: data_preprocessing.py: pre-process reviews on company pros and cons by: removing stop words + lemmatization
  • script: LDA_ntopics.py: trying LDA model with different number of topics and plot the coherence scores to find optimal number of topics to set for LDA model; trained LDA models with optimal number of topics
  • notebook: LDA_visualization.ipynb: visualized the topics found by LDA using pyLDAvis
  • script: label_topics.py: label topics to reviews using trained LDA model

Topic visualization

The screenshot below exhibits a visualization of a topic found by the LDA model of all the employee reviews about cons.

png

Topic labeling

The topics are then hand labeled according to the associated word frequency.

Topics of reviews on pros:

  • Salary and Benefits
  • Flexible Schedule
  • Career Opportunity
  • Work-Life Balance
  • Supportive Management
  • Culture and Value
  • Food and Perks
  • Friendly and Smart Colleagues
  • Friendly to Juniors

Topics for reviews on cons:

  • Low Pay and High Turnover Rate
  • Long Working Hours
  • Limited Career Opportunity
  • Demanding Work
  • Bad Manager
  • Poor Communication
  • Pressure from Sales and Customer Service
  • Slow to Adapt to Changes

Compare topic distribution across various companies (in progress)

Screenshots from a work-in-progress Tableau dashboard:

png

png