Learning how to build data science apps from freeCodeCamp.org
Tutorial by Data Professor: Build 12 Data Science Apps with Python and Streamlit - Full Course (actually there are only 10)
The tutorial above covers how to build interactive and data-driven Python web apps using the Streamlit library. This repo contains 9 web apps built from the tutorial with slight modifications and enhancement.
To install the Streamlit library
pip install streamlit
To run a web app
streamlit run myapp.py
A list of web apps in this repo:
📈 Simple Stock Price App
🧬 Simple Bioinformatics DNA Count App
🤾 Sports Exploratory Data Analysis App
🏭 S&P 500 Stock Price App
💹 Cryptocurrency Price App
🌷 Simple Iris Flower Prediction App
🐧 Simple Palmer Penguin Prediction App
🏡 Boston Housing Price Prediction App
🧪 Molecular Solubility Prediction App
A simple web app that shows the stock price of multiple multinational companies. Retrieve stock data directly from Yahoo Finance.
Towards Data Science article on How to Get Stock Data Using Python (using yfinance
)
- Using
streamlit
to build a basic web app - Using
yfinance
to get stock price data
A DNA nucleotide count web app that counts the nucleotide composition (A, T, G, C) of a query DNA
- Displaying images
- Taking input and showing output in different formats (dictionary, text, dataframe, plot)
Web scraping from Basketball Reference and Football Reference. Performs a simple exploratory data analysis by creating a heatmap. Combined EDA Basketball and EDA Football together.
- Select widget and multiselect widget in
streamlit
- Web scraping with
pandas
- Filtering data with conditions in
pandas
(data wrangling) - Downloading CSV files in
streamlit
The S&P 500 (the Standard and Poor's 500) is a market-capitalization-weighted measurement stock market index of the 500 largest companies listed on stock exchanges in the United States. It is one of the most commonly followed equity indices by investors.
A web app that scraps all latest data from the list of S&P 500 companies on Wikipedia, fetch their respective stock price data and plot the stock closing price.
- Web scraping with
pandas
- Fetching stock price with
yfinance
- Filtering data with conditions in
pandas
(data wrangling) - Ploting with
altair
A web app that scraps the latest cryptocurrency data from CoinMarketCap. Allow users to select different cryptocurrencies and make comparisons.
Medium article on Web Scraping Crypto Prices with Python
- Web scraping with
BeautifulSoup
- Page layout in
streamlit
A web app that predicts the iris flower type from the user input. The prediction is made by using a Random Forest Classification. Accepts user input parameters (sepal length, sepal width, petal length, petal width) and predict using a model built from the dataset.
The model applies the Iris Plants Dataset provided in the scikit-learn library.
- Built a classifier using the RandomForestClassifier model from
sklearn
Predicts the palmer penguin species (Chinstrap, Gentoo, and Adélie penguins) by using a classification model. Accepts user input in the form of CSV file upload or direct input.
The original Palmer Penguins Dataset is provided by Allison Horst. The model in this web app uses the cleaned dataset provided by Data Professor.
- Built a classifier using the RandomForestClassifier model from
sklearn
- File uploader in
streamlit
💻 Web app deployed on Heroku: https://boston-house-predict.herokuapp.com/
Predicts the median value of Boston House Price from the given input parameters which consists of:
- crim: per capita crime rate by town.
- zn: proportion of residential land zoned for lots over 25,000 sq.ft.
- indus: proportion of non-retail business acres per town.
- chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
- nox: nitrogen oxides concentration (parts per 10 million).
- rm: average number of rooms per dwelling.
- age: proportion of owner-occupied units built prior to 1940.
- dis: weighted mean of distances to five Boston employment centres.
- rad: index of accessibility to radial highways.
- tax: full-value property-tax rate per $10,000.
- ptratio: pupil-teacher ratio by town.
- b: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
- lstat: lower status of the population (percent).
The regression model uses Random Forest Regressor, trained by the Boston House Prices Dataset provided in the scikit-learn library. The model's prediction is explained by SHAP values.
- Built a regressor using the RandomForestRegressor model from
sklearn
- Using
shap
to explain the outcome of the prediction
Predict the solubility value (LogS) of input molecules. The input molecules must be in the SMILES notation. With the help of the RDKit library, this app will compute four descriptors for each molecule:
- MolLogP
- Molecular Weight
- Number Of Rotatable Bonds
- Aromatic Proportion
These four variables are crucial to the Linear Regression model as suggested by John S. Delaney in ESOL: Estimating Aqueous Solubility Directly from Molecular Structure. A cleaned dataset for training the model is provided by Data Professor.
The installation of the RDKit library using pip can be impossible for now, hence it is suggested to Install RDKit with Conda. This includes creating a conda environment for the installation.
- Built a regressor using the LinearRegression model from
sklearn
- Using
rdkit
for computing the required variables for the prediction