This course will introduce learners to the basics of text mining and text manipulation. The course begins with understanding how Python handles text, the structure of text for both machines and humans, and an overview of the nltk framework for text manipulation. The second week focuses on common manipulation needs, including regular expressions (searching for text), cleaning text, and preparing text for use in machine learning processes. The third week will apply basic natural language processing methods to text and demonstrate how text classification is achieved. The final week will explore more advanced methods for detecting topics in documents and grouping them by similarity (topic modeling).
This course should be taken after completing the following courses:
- Introduction to Data Science in Python,
- Applied Plotting, Charting & Data Representation in Python,
- Applied Machine Learning in Python.