Skip to content

This is a repository holding useful code or notes from my adventures in STAT 489: Principles of Data Science and Statistics, as well as personal experiments.

Notifications You must be signed in to change notification settings

Hanel32/Datascience_Statistics_Drawer_Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data_Science_Statistics_Drawer

Note: Please check out the following for good examples of personal work

  • 489_Homework_3: Cleaning, interpolation, and interpretation of narcotics data.
  • 489_Homework_4: Scraping and curation of a web-sourced dataset.
  • Kaggle Titanic: Final project from STAT 489 wherein I preform Exploratory Data Analysis (EDA) on the Titanic dataset.
  • Unstructured Exploration of Data: Unstructured EDA on NCI-60 Cancer and Wine Tasting datasets to discover features.

Each of these projects was carried out with Jupyter Notebooks to enhance readability.

Contents

This will be a repository to hold useful code or notes from my adventures in STAT 489: Principles of Data Science and Statistics Layout:

  • Homework One: An introductory Python assignment turned into a treatise on simple functional analytics
  • Homework Two: Creation of random standard normal and gamma distribution t-values and p-tests.
  • Notes on Pandas: In-class notes of how to read-in and handle data in Python with Pandas.
  • Notes on PyPlot: In-class notes of how to create basic Python graphs with PyPlot.
  • Notes on Statistical Experiments: In-class notes on statistical simulations using Python.
  • Linear Analysis: A survey of Linear Algebra
  • Statistics Analysis: A survey of Statistics
  • Probability Analysis: A survey of Probability Theory
  • Hypothesis Analysis: A survey of Hypothesis Testing, Confidence Intervals, P-hacking, A/B Tests, Bayesian Inference
  • Gradient Analysis: A survey of Gradient Descent, the popular mechanic involved in Machine Learning
  • Data Aquistion: A survey of introductory Regular Expressions, Web Scraping, and Beautiful Soup to mine text
  • Data Insights: A survey of exploring insights on a dataset before applying Machine Learning
  • Full Machine Learning Project: A quick manual on addressing a dataset, preparing the data, and running a classifier.
    • Note, this is from the book Hands On Machine Learning with Scikit-Learn and Tensorflow.
    • All further assignments will be either from there or in-class.
  • Myers Briggs Analysis: An analysis of most recent 50 tumblr posts by personality type for Social Media data mining.
  • MNIST Classifiers: A followup on the Full Machine Learning Project; Multilabel and Multioutput classification and verification metrics.
  • Unstructured Exploration of Data: A treatise in Hierarchical Clustering, KNN, and PCA methods and their metrics.
  • Training Models: An introduction and analysis of Linear Regression and Gradient Descent algorithms comparison as applied to ML.

Purpose:

My purpose for utilizing this "drawer" is to have a repository dedicated to learning and exercising understanding and eventually mastery over the fundamental concepts which are necessary to carry out exploratory data analysis, data mining, machine learning, and more.

About

This is a repository holding useful code or notes from my adventures in STAT 489: Principles of Data Science and Statistics, as well as personal experiments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published