This is aim to Python version of cdsw-simple-serving.
This repo has:
- data preparation with RDD
- built a simple machine learning pipeline with Spark.ml
- export built model
- example web server code for scoring
Currently, this repo doesn't have following features:
- export built model as PMML
pip install -r requirements.txt -c constraints.txt
HDFS_HOST
for handling HDFS files viahdfs
package
- as a template for collaboration with Data Engineer and Data Scientist
- create job dependencies from data preparation to model serving
- Create virtualenv for your app:
virtualenv -p python2 venv && source ./venv/bin/activate
- Install dependent libraries:
pip install -r requirements-webapp.txt
- Run example app:
spark-submit serving/web_app.py
then, you can POST data as follows:
$ curl -v -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"Temperature":23.18,"Humidity":27.272,"Light":426,"CO2":721.25,"HumidityRatio":0.00478}' http://localhost:5000/api/predict
or, if you want to use gunicorn
pip install -r requirements-webapp.txt
- Download spark repo
- Install pyspark dependencies:
cd some-spark-director/python && pip install -e
- Run example app:
cd serving; gunicorn web_app:app --log-file -