Skip to content

Group 1 Final Project: Data Analytics Boot Camp

Notifications You must be signed in to change notification settings

CFly17/Titanic-Dashboard

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic Dashboard

Project Outline

Selected Topic and Rationale

We will be using machine learning to create a model that predicts which passengers survived the Titanic shipwreck. Our team selected this topic because we wanted to obtain a deeper understanding of the tragedy and how different passenger attributes impacted their odds of survival.

Data Source

Our resources will come from the dataset contained in the "stablelearner" r-package and stored as CSV files.

Technologies Used

The project is broken into component pieces below.

Questions We Hope to Answer

We will run statistical analysis to see how different groups fared based on factors such as age, gender, socio-economic status, etc. We are hoping to add a section of our dashboard that allows users to input their own information and generate their probability of survival.

Importing titanic.csv into PgAdmin

  • Create a database named titanic_project and ensure it is selected with an active connection.
  • Add a config.py file to your Notebooks folder in the group repo. (The file is otherwise hidden in .gitignore.) It should read:
    db_password = '[insert your password here]'
    
  • Open a command line terminal in this same Notebooks folder and run jupyter notebook
  • Open the create_database.ipynb file and execute the four cells.
    • The script will create 2 different tables: passenger_registry and embarked. create_database.ipynb
  • Refresh your database. You should see both new tables passenger_registry and embarked
    • To confirm the data imported properly, run the following query:
      select * from passenger_registry where country='United States';
      
      You should see 264 records returned. pgAdmin database

Running Machine Learning Model

  • First, we cleaned the data and ensured it was ready for entry into the model
  • Next, we set up the machine learning model
    • Machine Learning Setup
  • Then, we fit the model using the decision tree method
    • Fitting The Model
  • After That, we ran the model and arrived at the following conclusions:
    • Confusion Matrix
  • Precision (True Positives divided by sum of True and False Positives): 79%

  • Recall (True Positives divided by sum of True Postivies and False Negatives): 90%

  • The Recall score is higher than precision, meaning our model is most likely to be incorrect when it rules out False Negative predictions. This means it's not as likely that the model would predict survival in an instance when the person actually would not survive.

    • Accuracy Score, Classification Report
  • Finally, we saved the scaler to a file using the dependency pickle

Flask: Running the application locally

  • Ensure that your development environment is active with

    conda activate [development-environment-name]
    
  • If you haven't already, install Flask with the following command

    pip install flask
    
  • For additional dependencies, see the requirements.txt file.

  • Navigate to the /webapp folder of the repo. Run the following command:

    flask run
    

    or

    python wsgi.py
    

    The app should open on a localhost (likely http://127.0.0.1:5000/). Copy this address into your browser and enjoy! dashboard

  • When you finish using the app, you can run Ctrl + C in the terminal to end the local connection.

About

Group 1 Final Project: Data Analytics Boot Camp

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 89.8%
  • Jupyter Notebook 8.6%
  • Python 1.3%
  • R 0.2%
  • CSS 0.1%
  • Shell 0.0%