Nanodegree Portfolio

A collection of projects completed as part of Udacity's Data Analyst Nanodegree.

Jupyter Data Analysis
R Exploratory Data Analysis
Tableau Visualization
SQL Data Wrangling
Inferential Statistics
Scikit-Learn Machine Learning

Jupyter Data Analysis

Baseball Statistics

This project used Sean Lahman's Major League Baseball data set to investigate whether or not the level of professional baseball players had, overall, improved. The inquiry was limited from 1955 to 2017 and placed an emphasis on batter ability (measured with On-Base plus Slugging) and pitcher ability (measured with Fielding Independent Pitching).

Highlights

No trend, positive or negative, was observed in player ability.
Environmental factors (like changing the strike zone) account for much more variability in statistics than player ability.
Uses Python, matplotlib, pandas, and numpy.

R Exploratory Data Analysis

U.S. College Statistics

This project investigated a few key variables from College Scorecard, a dataset created by the U.S. Department of Education to evaluate universities across the nation. An emphasis was placed on four-year universities with variables related to admissions, finances, and location.

Highlights

There appears to be a noticable trend relating tuition and five-year completion rates.
There is also a distinct correlation between funding type (public, non-profit, for-profit) and completion rate.
Uses R and ggplots.

Tableau Data Visualization

U.S. College Statistics

This project focused specifically on for-profit universities. Unlike the R data exploration project (which used the same dataset), this project analyzed data across many years.

https://public.tableau.com/views/NanodegreeDataVisProjectII/For-ProfitUniversityConcerns?:embed=y&:display_count=yes

Highlights

A Tableau Story which details some of the concerns surrounding for-profit universities.
Multiple interactive charts that can filtered by year.

SQL Data Wrangling

OpenStreetMap Southwest Idaho

This project attempted to clean and organize a set of geographical data for Southwest Idaho.

Highlights

Conversions between XML, CSV, and SQL data.
SQL queries and simple regular expressions.

Inferential Statistics

Provided Stroop Effect Data

This project made use of descriptive and inferential statistics to analyze the significance of the Stroop Effect for a given set of data.

Highlights

Formal report of statistical significance written in LaTeX.
Histograms generated with RStudio.
Data analyzed with Google Spreadsheets.

Scikit-Learn Machine Learning

Enron Data

This project scanned a pool of Enron email data for patterns, then built a classifier to determine persons likely involved in illicit activities.

Highlights

Multiple algorithms used with parameter tuning.
Charts illustrating the efficacy of particular features.
A writeup detailing the forms of assessment used (accuracy, precision, recall, F1).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Jupyter_DataAnalysis		Jupyter_DataAnalysis
LaTeX_Statistics		LaTeX_Statistics
R_EDA		R_EDA
SQL_Data_Wrangle		SQL_Data_Wrangle
Scikit_Machine_Learning		Scikit_Machine_Learning
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nanodegree Portfolio

Jupyter Data Analysis

R Exploratory Data Analysis

Tableau Data Visualization

SQL Data Wrangling

Inferential Statistics

Scikit-Learn Machine Learning

About

Releases

Packages

Languages

justinrgarrard/NanodegreePortfolio

Folders and files

Latest commit

History

Repository files navigation

Nanodegree Portfolio

Jupyter Data Analysis

R Exploratory Data Analysis

Tableau Data Visualization

SQL Data Wrangling

Inferential Statistics

Scikit-Learn Machine Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages