Skip to content

Latest commit

 

History

History
39 lines (33 loc) · 2.78 KB

File metadata and controls

39 lines (33 loc) · 2.78 KB

Data_Science_Statistics_Drawer

Note: Please check out the following for good examples of personal work

  • 489_Homework_3: Cleaning, interpolation, and interpretation of narcotics data.
  • 489_Homework_4: Scraping and curation of a web-sourced dataset.
  • Kaggle Titanic: Final project from STAT 489 wherein I preform Exploratory Data Analysis (EDA) on the Titanic dataset.
  • Unstructured Exploration of Data: Unstructured EDA on NCI-60 Cancer and Wine Tasting datasets to discover features.

Each of these projects was carried out with Jupyter Notebooks to enhance readability.

Contents

This will be a repository to hold useful code or notes from my adventures in STAT 489: Principles of Data Science and Statistics Layout:

  • Homework One: An introductory Python assignment turned into a treatise on simple functional analytics
  • Homework Two: Creation of random standard normal and gamma distribution t-values and p-tests.
  • Notes on Pandas: In-class notes of how to read-in and handle data in Python with Pandas.
  • Notes on PyPlot: In-class notes of how to create basic Python graphs with PyPlot.
  • Notes on Statistical Experiments: In-class notes on statistical simulations using Python.
  • Linear Analysis: A survey of Linear Algebra
  • Statistics Analysis: A survey of Statistics
  • Probability Analysis: A survey of Probability Theory
  • Hypothesis Analysis: A survey of Hypothesis Testing, Confidence Intervals, P-hacking, A/B Tests, Bayesian Inference
  • Gradient Analysis: A survey of Gradient Descent, the popular mechanic involved in Machine Learning
  • Data Aquistion: A survey of introductory Regular Expressions, Web Scraping, and Beautiful Soup to mine text
  • Data Insights: A survey of exploring insights on a dataset before applying Machine Learning
  • Full Machine Learning Project: A quick manual on addressing a dataset, preparing the data, and running a classifier.
    • Note, this is from the book Hands On Machine Learning with Scikit-Learn and Tensorflow.
    • All further assignments will be either from there or in-class.
  • Myers Briggs Analysis: An analysis of most recent 50 tumblr posts by personality type for Social Media data mining.
  • MNIST Classifiers: A followup on the Full Machine Learning Project; Multilabel and Multioutput classification and verification metrics.
  • Unstructured Exploration of Data: A treatise in Hierarchical Clustering, KNN, and PCA methods and their metrics.
  • Training Models: An introduction and analysis of Linear Regression and Gradient Descent algorithms comparison as applied to ML.

Purpose:

My purpose for utilizing this "drawer" is to have a repository dedicated to learning and exercising understanding and eventually mastery over the fundamental concepts which are necessary to carry out exploratory data analysis, data mining, machine learning, and more.