Skip to content

Latest commit

 

History

History
136 lines (85 loc) · 9.96 KB

workshop-environment.adoc

File metadata and controls

136 lines (85 loc) · 9.96 KB

The Workshop Environment You Are Using

Your workshop environment consists of several components which have been pre-installed and are ready to use. Depending on which parts of the workshop you are doing, you will use one or more of:

  • Red Hat OpenShift - You will use one or more projects (Kubernetes namespaces) that are your own and are isolated from other workshop students

  • Red Hat CodeReady Workspaces - based on Eclipse Che, it is a cloud-based, in-browser IDE (similar to IntelliJ IDEA, VSCode, Eclipse IDE). You’ve been provisioned your own personal workspace for use with this workshop. You will write, test, and deploy code from here.

  • Red Hat AMQ Streams - streaming data platform based on Apache Kafka

  • Buildah - Buildah is a tool that facilitates building Open Container Initiative (OCI) container images. We will be using Buildah in this workshop to create models from constructed images.

  • Jupyter Notebook - An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Jupyter Notebook will be used to build, train and test models in this workshop.

  • JupyterHub - A multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. In this workshop we will be logging into this environment to provision instances of Jupyter Notebook.

  • Tekton - An open source project that provides a framework to create cloud-native CI/CD pipelines quickly. We will be creating pipelines based on Tekton in this workshop.

  • Argo CD - Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes.

  • Prometheus - An open-source systems monitoring toolkit for event monitoring and alerting. Prometheus will be collating the metrics from the produced models in this workshop.

  • Grafana - An open-source visualization and analytics software. In this workshop Grafana allows you to query, visualize, alert on, and explore the metrics collected by Prometheus.

  • Gogs - Gogs is a simple, stable and extensible self-hosted Git service which will be created for the purposes of this workshop.

  • Seldon - Provides a set of tools for deploying machine learning models at scale. In this workshop, Seldon will be used for a variety of purposes including providing endpoints for metric collection and allows us to provide feedback to the created models.

  • Nexus - Nexus Repository OSS is an open source repository that supports many artifact formats, including Docker, Java™, and npm.

You will be provided clickable URLs throughout the workshop to access the services that have been installed for you.

Workshop Flow

The goal of this workshop is the delivery of an end-to-end AI/ML solution for fraud detection based on the respective software components and their processes. The main steps are as follows:

1) Create an AI/ML model that predicts fraud transactions using Jupyter Notebook.

2) To productize the model. Using CodeReady Workspaces provided by OpenShift to convert code into python, before delivering it into a pipeline for the process of building, testing and deploying code, mainly in three different environments:

  • Development Environment: Created models will be converted into python code here via CodeReady Workspaces, where these models will be trained and tested with a mixture of fraudulent and non-fraudulent inputs. Versioning will be done to ensure reproducibility of the model.

  • Staging Environment: When the model has undergone all the relevant testing, an image will be created with Buildah and stored in Nexus

  • Production Environment: Once the model has been verified to be in proper working condition, it will be promoted to the production environment, where approved images are deployed into the environment.

3) Monitoring provided by Prometheus/Grafana provides insight into the credibility of the fraud detection model as well as account for any sanity check on the results derived from the associated data. The visualized metrics reflect any performance degradation of the model, which preempts us to decide on either starting on new experiment iterations or retraining the model with new data, considering the regular evolution of fraud practices in the real world.

Pipeline

A pipeline in software development is an automated process that drives software through a path of building, testing, and deploying code. By automating the process, the objective is to minimize human error and maintain a consistent process for how software is deployed. Tools that are included in the pipeline could include compiling code, unit tests, code analysis, security, and installer creation. For containerized environments, this pipeline would also include packaging the code into a container to be deployed across the hybrid cloud. A pipeline is critical in supporting continuous integration and continuous deployment (CI/CD) processes.

In this workshop, the CI/CD experience is provided by a combination of OpenShift Pipelines (based on Tekton) and Argo CD.

OpenShift Pipelines is responsible for the CI process in which models are built and trained in the staging process, and pushed to a Nexus registry thereafter. Argo CD is used to deliver the CD process, responsible for monitoring the deployment and pushing any new changes in the deployment to OpenShift.

GPU

In this workshop, we will not be using GPU to train the models.

GPU is supported on OpenShift via the Nvidia GPU Operator and Node Feature Discovery (NFD) Operator. The NFD operator identifies hardware device features in nodes. It solves the general problem of identifying and cataloging hardware resources in the infrastructure nodes so they can be made available to OpenShift.

nvdia-gpu-operator-manual-install

The NFD operator uses vendor PCI IDs to identify hardware in a node and Nivdia uses the PCI ID 10de.

$ oc describe node ip-10-0-132-138.us-east-2.compute.internal | egrep 'Roles|10de'

Roles: worker
feature.node.kubernetes.io/pci-10de.present=true

10de appears in the node feature list for the GPU-enabled node we defined. These labels created by the NFD operator are what the GPU Operator uses in order to determine where to deploy the driver containers for the GPU(s).

You can then consume these GPUs from your containers by requesting nvidia.com/gpu just like you request cpu or memory.

resources:
  requests:
    nvidia.com/gpu: 1 # Num of GPUs
  limits:
    nvidia.com/gpu: 1

Model Training And Serving Frameworks

We will be using Seldon to package the trained model into a Flask app.

In Kubernetes, there are other Custom Resources (CR) such as:

  • TensorFlow. TFJob provided by Kubeflow. You can serve the model directly from S3 or use S2I.

  • Kubeflow XGBoost operator

  • PyTorch jobs to run distributed training.

First Step: Confirm Your Username!

Look in the box at the top of your screen. Is your username set already? If so it will look like this:

Set User ID above

If your username is properly set, then you can move on. If not, in the above box, enter the user ID you were assigned like this:

Set User ID above

This will customize the links and copy/paste code for this workshop. If you have accidentally typed the wrong username, just click the green recycle icon to reset it.

Click-to-Copy

You will see various code and command blocks throughout these exercises which can be copy/pasted directly by clicking anywhere on the block of text. Simply click once and the whole block is copied to your clipboard, ready to be pasted with CTRL+V (or Command+V on Mac OS).

echo "This is a bash shell command that you can copy/paste by clicking"

Your Environment

Your user id is {{ USER_ID }}

OpenShift Console url is {{ CONSOLE_URL }}. Username/password is {{ USER_ID }}/{{ OPENSHIFT_USER_PASSWORD }}.

CodeReady Workspaces url is {{ ECLIPSE_CHE_URL }}. Username/password is {{ USER_ID }}/{{ CHE_USER_PASSWORD }}.

Git url is {{ GIT_URL }}. Username/password is {{USER_ID}}/{{GIT_USER_PASSWORD}}.

JupyterHub url is {{ JUPYTERHUB_URL }}

Grafana url is {{ GRAFANA_URL }}

Argo CD url is {{ ARGOCD_URL }}

Nexus url is {{ NEXUS_URL }}

How to complete this workshop

Simply follow these instructions end-to-end. You will need to do quite a bit of copy/paste for Linux commands and source code modifications, as well as clicking around on various consoles used in the labs. When you get to the end of each section, you can click the Next > button at the bottom to advance to the next topic. You can also use the menu on the left to move around the instructions at will.

The entire workshop is split into one or more modules - Look at the top of the screen in the header to see which module you are on. After you complete this module, your instructor may have additional modules to complete.

Good luck, and let’s get started!