Nothing too fancy happening here. Just a small side project.
Things required:
- MongoDB. Docker compose file included.
- Python 3.6.8
- Packages given in
requirements.txt
Operating:
get_videos.py
is what obtains video IDs from the YouTube API. The correct endpoint is given but operation requires an API key in your environment variables.get_transcripts.py
is what does the heavy lifting in obtaining the video transcripts and word counts. Should just require a button push once you've got the videos.nl_words_analysis.ipynb
is a Jupyter Notebook with various analysis things.Stop_Words.txt
is a large list of stop words to combine with whatever is already inNLTK
.