Nothing too fancy happening here. Just a small side project.
Things required:
- MongoDB. Docker compose file included.
- Python 3.6.8
- Packages given in
is what obtains video IDs from the YouTube API. The correct endpoint is given but operation requires an API key in your environment
is what does the heavy lifting in obtaining the video transcripts and word counts. Should just require a button push once you've got the videos.nl_words_analysis.ipynb
is a Jupyter Notebook with various analysis things.Stop_Words.txt
is a large list of stop words to combine with whatever is already inNLTK