The data for this project is taken from, https://www.kaggle.com/mauriciocap/crunchbase2013
AWS Deployment- Startup Success Predictor: http://ec2-3-16-188-11.us-east-2.compute.amazonaws.com:5000/
Data.Viz https://public.tableau.com/profile/aishwarya.bansode#!/vizhome/Healthcarecrowdfunding/Dashboard2
The main aim of the project was to give the probability of success of the startup to elevate revenue growth. We found the companies within the last decade which had raised more than one round of funding. Status of the company which includes IPO, Acquired, Closed, Operating was changed to success and failure. After tunning the logistic regression model, the model was deployed on a flask application which was dockerized and hosted on AWS EC2 instance. With the thousands of companies from different domains, we considered one of them i.e Healthcare industries. we considered the following features for the prediction,
- Total Funding (USD)
- Funding Rounds
- Time Between First Round
- Avg Raised USD
- Avg Time Between Rounds
- Investor Counts
- Type of Industry
- Regions (In USA)
- Homepage URL (existing or not)
The entire workflow is shown below,
This project requires Python 3 and the following Python libraries installed:
- Logistic Regression
- Support Vector Machine
- xgboost
- Random Forrest
- Pickel
- Flask
- scikit-learn
- pandas
- Matplotlib
Make sure you have Jupyter Notebook installed.
You could just install Anaconda distribution of Python, which already has the above packages and more included.
Also, you need to have an account on Amazon Web Service AWS so that you can host on EC2.
There are two main notebooks, one is the cleaning notebook, where the data is cleaned and prepared for modeling and the other is the modeling notebook where the ML models and the model performance are compared,
Data_Cleaning1.ipynb
and
Data_Modelling2.ipynb
Web app folder contains the flask application and the Docker folder contain the docker file and requirements.txt file.
Here is the web application which takes the parameter and tells us the prediction probability and the factors on which it calculates is shown in the dashboard.