Please check the release note for the new features in v0.14.0. The upgrade process will take hours depends on the number of nodes in the cluster and the internet network speed. During the upgrade, running jobs will fail. And jobs will automatically retry after the upgrade have done.
Table of Contents
The dev-box is a docker container which prepares the dependency environment during deployment. All the commands in this doc are excuted in the dev-box. Hence firstly you need to build dev-box of 0.14.0 version and start it.
# build dev-box
git clone https://github.com/Microsoft/pai.git
cd pai
git checkout v0.14.0
cd src/dev-box
sudo docker build -t dev-box . --file=./build/dev-box.dockerfile
# create dev-box
sudo docker run -itd \
-e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /pathConfiguration:/cluster-configuration \
-v /hadoop-binary:/hadoop-binary \
--pid=host \
--privileged=true \
--net=host \
--name=dev-box \
dev-box
# Working in your dev-box
sudo docker exec -it dev-box /bin/bash
pip install future
cd /pai
# setup kubernetes environments
./paictl.py cluster k8s-set-env
# then input master node ip
Notices: Make sure the python file has execution permission.
If this is the first time you deploy PAI cluster, you should refer the deployment doc to prepare configuration
If you have deployed pai before, and the cluster version is v0.9 or higher, then you could use paictl to pull the config from cluster:
./paictl.py config pull -o <path_of_config>
There should be four files under the <path_of_config>
:
layout.yaml
(orcluster-configuration.yaml
in version 0.8/0.9)k8s-role-definition.yaml
kubernetes-configuration.yaml
services-configuration.yaml
Whenever you were asked to input the cluster id, you could run
./paictl.py config get-id
to get it.
Check version from the cluster configuration file service-configuration.yaml
.
It looks like:
cluster:
...
docker-registry:
namespace: openpai
domain: docker.io
tag: v0.8.3 # It's your cluster version
...
Change the version of tag to v0.14.0
To ensure your config is new style, PAI provide a script tools to convert configuration from old style.
Usage:
./deployment/tools/configMigration.py <path_of_config> <path_of_new_config>
Then you could customize the generate config under the directory <path_of_new_config
based on the configuration doc.
If you are using webportal plugin before v0.11 (confirmed that if webportal.plugins
in services-configuration.yaml
),
you need to run an extra script after above command:
./deployment/tools/pluginIdMigration.py <path_of_new_config> <path_of_new_config>
PAI provide an check
command for validating configuration, usage as below:
./paictl.py check -p <path_of_new_config>
Notices: the configuration pushed to cluster won't take effect until we restart the PAI Services. Use the command like below:
./paictl.py config push -p <path_of_new_config>
If you have a pai cluster with previous version, first stop all PAI services:
./paictl.py service stop
Now the PAI is down, won't be able to access the PAI dashboard.
Data won't lost during the upgrade, the backup is optional but recommended.
Now please login onto the master node, and backup the data for ETCD, Zookeeper and etc. Below is a list of directories should take care (please backup them):
- PAI common data path, check the
service-configuration.yaml
, there is a configcluster.common.data-path
. Please don't change it unless you know exactly what you are doing. - Etcd data path, check the
kubernetes-configuration.yaml
, there is a configkubernetes.ectd-data-path
.
We will reinstall it with new configuration, destroy it first:
./paictl.py cluster k8s-clean -p <path_of_new_config>
Now the Kubernetes cluster is down.
Install the Kubernetes cluster:
./paictl.py cluster k8s-bootup -p <path_of_new_config>
Now the Kubernetes cluster is up, you can check the Kubernetes dashboard.
This process is optional. By default all PAI services will use official v0.14.0 images in docker.io. This section works if your cluster has a private docker registry and you want to make some customization based on v0.14.0.
Build and push all the service images with 0.14.0 version. Make sure the pai code is newest. Please refer to build doc for more details.
./build/pai_build.py build -c <path_of_new_config>
If build successfully, push them to the registry. Before you push the docker images, make sure the file services-configureation.yaml
has correct docker registry info like this:
docker-registry:
domain: xxx.azurecr.io
namespace: yyy
password: zzz
secret-name: aaa
tag: 'v0.14.0'
./build/pai_build.py push -c <path_of_new_config>
Use paictl to start all services.
./paictl.py service start
Now the PAI is up, you can visit the PAI dashboard.