Note: Please check out the following for good examples of personal work
- 489_Homework_3: Cleaning, interpolation, and interpretation of narcotics data.
- 489_Homework_4: Scraping and curation of a web-sourced dataset.
- Kaggle Titanic: Final project from STAT 489 wherein I preform Exploratory Data Analysis (EDA) on the Titanic dataset.
- Unstructured Exploration of Data: Unstructured EDA on NCI-60 Cancer and Wine Tasting datasets to discover features.
Each of these projects was carried out with Jupyter Notebooks to enhance readability.
This will be a repository to hold useful code or notes from my adventures in STAT 489: Principles of Data Science and Statistics Layout:
- Homework One: An introductory Python assignment turned into a treatise on simple functional analytics
- Homework Two: Creation of random standard normal and gamma distribution t-values and p-tests.
- Notes on Pandas: In-class notes of how to read-in and handle data in Python with Pandas.
- Notes on PyPlot: In-class notes of how to create basic Python graphs with PyPlot.
- Notes on Statistical Experiments: In-class notes on statistical simulations using Python.
- Linear Analysis: A survey of Linear Algebra
- Statistics Analysis: A survey of Statistics
- Probability Analysis: A survey of Probability Theory
- Hypothesis Analysis: A survey of Hypothesis Testing, Confidence Intervals, P-hacking, A/B Tests, Bayesian Inference
- Gradient Analysis: A survey of Gradient Descent, the popular mechanic involved in Machine Learning
- Data Aquistion: A survey of introductory Regular Expressions, Web Scraping, and Beautiful Soup to mine text
- Data Insights: A survey of exploring insights on a dataset before applying Machine Learning
- Full Machine Learning Project: A quick manual on addressing a dataset, preparing the data, and running a classifier.
- Note, this is from the book Hands On Machine Learning with Scikit-Learn and Tensorflow.
- All further assignments will be either from there or in-class.
- Myers Briggs Analysis: An analysis of most recent 50 tumblr posts by personality type for Social Media data mining.
- MNIST Classifiers: A followup on the Full Machine Learning Project; Multilabel and Multioutput classification and verification metrics.
- Unstructured Exploration of Data: A treatise in Hierarchical Clustering, KNN, and PCA methods and their metrics.
- Training Models: An introduction and analysis of Linear Regression and Gradient Descent algorithms comparison as applied to ML.
Purpose:
My purpose for utilizing this "drawer" is to have a repository dedicated to learning and exercising understanding and eventually mastery over the fundamental concepts which are necessary to carry out exploratory data analysis, data mining, machine learning, and more.