Skip to content

Commit

Permalink
Merge pull request #135 from getindata/release-0.10.0
Browse files Browse the repository at this point in the history
Release 0.10.0
  • Loading branch information
Lasica authored Nov 22, 2023
2 parents 0bcb35e + d803e4a commit b70081c
Show file tree
Hide file tree
Showing 39 changed files with 2,109 additions and 1,290 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.9.1
current_version = 0.10.0

[bumpversion:file:pyproject.toml]

Expand Down
2 changes: 1 addition & 1 deletion .copier-answers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: Kedro plugin with GCP Vertex AI support
docs_url: https://kedro-vertexai.readthedocs.io/
full_name: Kedro Vertex AI Plugin
github_url: https://github.com/getindata/kedro-vertexai
initial_version: 0.9.1
initial_version: 0.10.0
keywords:
- kedro-plugin
- kedro
Expand Down
15 changes: 15 additions & 0 deletions .github/pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
repos:
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black", "--line-length=79"]
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: flake8
args: ['--ignore=E203,W503', '--max-line-length=120'] # see https://github.com/psf/black/issues/315 https://github.com/psf/black/issues/52
4 changes: 2 additions & 2 deletions .github/workflows/prepare-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ jobs:
env:
PYTHON_PACKAGE: kedro_vertexai
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
uses: actions/setup-python@v4.7.1
with:
python-version: ${{ matrix.python-version }}
- name: Validate inputs
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/spellcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ jobs:
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Spellcheck
- uses: actions/checkout@v2
- uses: rojopolis/spellcheck-github-actions@0.25.0
name: Spellcheck
- uses: actions/checkout@v4
- uses: rojopolis/spellcheck-github-actions@0.35.0
name: Spellcheck
45 changes: 28 additions & 17 deletions .github/workflows/test_and_publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ jobs:
python-version: [ '3.8', '3.9', '3.10']

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Setup python ${{ matrix.python-version }}
uses: actions/setup-python@v2.2.1
uses: actions/setup-python@v4.7.1
with:
python-version: ${{ matrix.python-version }}

Expand All @@ -36,7 +36,7 @@ jobs:
- name: Check pre-commit status
run: |
poetry install -v
poetry run pre-commit run --all-files
poetry run pre-commit run --all-files -c .github/pre-commit-config.yaml
- name: Test with tox
run: |
Expand All @@ -55,7 +55,7 @@ jobs:
needs: unit_tests
steps:

- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
fetch-depth: 0

Expand All @@ -82,7 +82,7 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
Expand All @@ -98,12 +98,16 @@ jobs:

e2e_tests:
runs-on: ubuntu-latest
timeout-minutes: 60
needs: [unit_tests, sonarcloud]
strategy:
matrix:
e2e_case: ["standard", "grouping"]
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Setup python
uses: actions/setup-python@v2.2.1
uses: actions/setup-python@v4.7.1
with:
python-version: 3.8

Expand All @@ -120,7 +124,7 @@ jobs:
# kedro 0.18.1 is on purpose here, due to https://github.com/kedro-org/kedro-starters/issues/99
run: |
pip install $(find "./dist" -name "*.tar.gz")
kedro new --starter spaceflights --config tests/e2e/starter-config.yml --verbose
kedro new --starter spaceflights --config tests/e2e/${{ matrix.e2e_case }}/starter-config.yml --verbose
- name: Install project dependencies
working-directory: ./spaceflights
Expand All @@ -139,29 +143,35 @@ jobs:
sed -i 's/\(COPY src\/requirements.txt.*\)$/\1\nCOPY kedro-vertexai.tar.gz ./g' Dockerfile
echo "!data/01_raw" >> .dockerignore
kedro vertexai init gid-ml-ops-sandbox europe-west4
mv ../tests/e2e/catalog.yml conf/base/catalog.yml
mv ../tests/e2e/vertexai.yml conf/base/vertexai.yml
cp ../tests/e2e/${{ matrix.e2e_case }}/catalog.yml conf/base/catalog.yml
cp ../tests/e2e/${{ matrix.e2e_case }}/vertexai.yml conf/base/vertexai.yml
# Introducing tagging to pipelines
if [[ "${{ matrix.e2e_case }}" == "grouping" ]]; then
mv ../tests/e2e/${{ matrix.e2e_case }}/pipeline_data_processing.py src/spaceflights/pipelines/data_processing/pipeline.py
mv ../tests/e2e/${{ matrix.e2e_case }}/pipeline_data_science.py src/spaceflights/pipelines/data_science/pipeline.py
fi
- name: Prepare docker env
uses: docker/setup-buildx-action@v1
uses: docker/setup-buildx-action@v3
id: buildx
with:
install: true

- name: Build pipeline docker image
run: |
cd ./spaceflights
docker build --build-arg BASE_IMAGE=python:3.8-buster --tag kedro-vertexai-e2e:latest --load .
docker pull gcr.io/gid-ml-ops-sandbox/kedro-vertexai-e2e:${{ matrix.e2e_case }} || true
docker build --build-arg BASE_IMAGE=python:3.8-buster --tag kedro-vertexai-e2e:${{ matrix.e2e_case }} --load --cache-from=gcr.io/gid-ml-ops-sandbox/kedro-vertexai-e2e:${{ matrix.e2e_case }} .
- name: Publish docker image to GCR
uses: mattes/gce-docker-push-action@v1
with:
creds: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
src: kedro-vertexai-e2e:latest
dst: gcr.io/gid-ml-ops-sandbox/kedro-vertexai-e2e:latest
src: kedro-vertexai-e2e:${{ matrix.e2e_case }}
dst: gcr.io/gid-ml-ops-sandbox/kedro-vertexai-e2e:${{ matrix.e2e_case }}

- name: Set up GCP Credentials
uses: google-github-actions/auth@v0.6.0
uses: google-github-actions/auth@v1.1.1
with:
credentials_json: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
create_credentials_file: true
Expand All @@ -172,6 +182,7 @@ jobs:
cd ./spaceflights
export KEDRO_CONFIG_COMMIT_ID=$GITHUB_SHA
kedro vertexai run-once --wait-for-completion
publish:
if: github.event.pull_request == null && github.ref == 'refs/heads/master'
needs: [ e2e_tests, codeql ]
Expand All @@ -180,11 +191,11 @@ jobs:
PYTHON_PACKAGE: kedro_vertexai
steps:
- name: Checkout the repo
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
fetch-depth: 0 # necessary to enable merging, all the history is needed
- name: Set up Python 3.8
uses: actions/setup-python@v1
uses: actions/setup-python@v4.7.1
with:
python-version: "3.8"
- name: Build package dist from source # A better way will be : https://packaging.python.org/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/ but pep 517 is still marked as experimental
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -174,3 +174,8 @@ terraform/terraform.tfstate
.idea
conf/azure/credentials.yml

# pyspelling
dictionary.dic

# vs code
.vscode
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# THIS IS LOCAL PRE-COMMIT CONFIG
# CICD uses separate config in .github until spellcheck hook gets improved
repos:
- repo: https://github.com/pycqa/isort
rev: 5.12.0
Expand All @@ -13,3 +15,7 @@ repos:
hooks:
- id: flake8
args: ['--ignore=E203,W503', '--max-line-length=120'] # see https://github.com/psf/black/issues/315 https://github.com/psf/black/issues/52
- repo: https://github.com/getindata/py-pre-commit-hooks
rev: v0.2.1
hooks:
- id: pyspelling-docker
42 changes: 21 additions & 21 deletions .spellcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,26 @@ matrix:
- "README.md"
default_encoding: utf-8
pipeline:
- pyspelling.filters.context:
context_visible_first: true
escapes: \\[\\`~]
delimiters:
# Ignore multiline content between fences (fences can have 3 or more back ticks)
# ```
# content
# ```
- open: '^(?s)(?P<open>`{1,3})[^`]'
close: '(?P=open)'
# Ignore text between inline back ticks
- open: '(?P<open>`)[^`]'
close: '(?P=open)'
# Ignore text in brackets [] and ()
- open: '\['
close: '\]'
- open: '\('
close: '\)'
- open: '\{'
close: '\}'
- pyspelling.filters.context:
context_visible_first: true
escapes: \\[\\`~]
delimiters:
# Ignore multiline content between fences (fences can have 3 or more back ticks)
# ```
# content
# ```
- open: '(?s)^(?P<open>`{1,3})[^`]'
close: '(?P=open)'
# Ignore text between inline back ticks
- open: '(?P<open>`)[^`]'
close: '(?P=open)'
# Ignore text in brackets [] and ()
- open: '\['
close: '\]'
- open: '\('
close: '\)'
- open: '\{'
close: '\}'
dictionary:
wordlists:
- docs/spellcheck_exceptions.txt
- docs/spellcheck_exceptions.txt
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

## [Unreleased]

## [0.10.0] - 2023-11-22

- Added explicite pyarrow dependency to avoid critical vulnerability
- Updated dependencies and tested for kedro `0.18.14`
- [Feature 🚀] Node grouping: added option to group multiple Kedro nodes together at execution in single Vertex AI process to allow better optimization - less steps, shorter delays while running Vertex AI nodes and less wasted time of data serialization thanks to possibility to use the MemoryDataset

## [0.9.1] - 2023-08-16

- Updated dependencies of kedro to `0.18.8`, mlflow to `2.3.2` and others
Expand Down Expand Up @@ -68,7 +74,9 @@

- Initial version of **kedro-vertexai** plugin extracted from [kedro-kubeflow v0.6.0](https://github.com/getindata/kedro-kubeflow/tree/0.6.0)

[Unreleased]: https://github.com/getindata/kedro-vertexai/compare/0.9.1...HEAD
[Unreleased]: https://github.com/getindata/kedro-vertexai/compare/0.10.0...HEAD

[0.10.0]: https://github.com/getindata/kedro-vertexai/compare/0.9.1...0.10.0

[0.9.1]: https://github.com/getindata/kedro-vertexai/compare/0.9.0...0.9.1

Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
7 changes: 7 additions & 0 deletions docs/source/02_installation/01_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ Commands:
ui Open VertexAI Pipelines UI in new browser tab
```

````{warning}
`vertexai` sub-command group only becomes visible when used inside kedro project context. Make sure that you're inside one, in case you see the message:
```
Error: No such command 'vertexai'.
```
````

### `init`

`init` command takes two arguments: `PROJECT_ID` and `REGION`. This command generates a sample
Expand Down
58 changes: 56 additions & 2 deletions docs/source/02_installation/02_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,15 @@ run_config:
# Optional pipeline description
#description: "Very Important Pipeline"

# Optional config for node execution grouping. - 2 classes are provided:
# - default no-grouping option IdentityNodeGrouper
# - tag based grouping with TagNodeGrouper
grouping:
cls: kedro_vertexai.grouping.IdentityNodeGrouper
# cls: kedro_vertexai.grouping.TagNodeGrouper
# params:
# tag_prefix: "group:"

# How long to keep underlying Argo workflow (together with pods and data
# volume after pipeline finishes) [in seconds]. Default: 1 week
ttl: 604800
Expand All @@ -34,8 +43,10 @@ run_config:
# pipeline status. Used to send notifications or raise the alerts
# on_exit_pipeline: notify_via_slack

# Optional section allowing adjustment of the resources
# reservations and limits for the nodes
# Optional section allowing adjustment of the resources, reservations and limits
# for the nodes. You can specify node names or tags to select which nodes the requirements
# apply to (also in node selectors). When not provided they're set to 500m cpu and 1024Mi memory.
# If you don't want to specify pipeline resources set both to None in __default__.
resources:

# For nodes that require more RAM you can increase the "memory"
Expand Down Expand Up @@ -170,6 +181,49 @@ def generate_config(self) -> dict:
First one - `target_config_file` should return the name of the configuration file to be generated (e.g. `credentials.yml`) and the `generate_config` should return a dictionary, which will be then serialized into the target file as YAML. If the target file already exists during the invocation, it will be merged (see method `kedro_vertexai.dynamic_config.DynamicConfigProvider.merge_with_existing` ) with the existing one and then saved again.
Note that the `generate_config` has access to an initialized plugin config via `self.config` property, so any values from the `vertexai.yml` configuration is accessible.


## Grouping feature

Optional `grouping` section enables grouping feature that aggregates many Kedro nodes execution to single VertexAI node(s). Using it allows you to freely subdivide Kedro pipelines to as many steps as logically makes sense while keeping advantages of in memory data transmission possibilities. It also saves you a lot of time avoiding delays of docker container starting at Vertex nodes which can amount to about 2 minutes for each VertexAI node.

API allows implementation of your own aggregation method. You can provide aggregating class and its additional init params as `kwargs` dictionary. Default class is `IdentitiyNodeGrouper` which actually does not group the nodes (plugin behaves as in versions before `0.9.1`). Class that implements grouping using configured tag prefix is called `TagNodeGrouper`. The default prefix is `"group:"`. It uses what follows after the tag prefix as a name of group of nodes. Only one tag with this grouping prefix is allowed per node; more than that results in `GroupingException`. Example configuration:
```yaml
grouping:
cls: kedro_vertexai.grouping.TagNodeGrouper
params:
tag_prefix: "group:"
```

The above configuration will result in the following result in this sample pipeline:
```python
Pipeline([
node(some_operation, "A", "B", name="node1", tags=["foo", "group:nodegroup"]),
node(some_operation, "B", "C", name="node2", tags=["bar", "group:nodegroup"]),
node(some_operation, "C", "D", name="node3", tags=["baz"]),
])
```
The result will be 2 VertexAI nodes for this pipeline, first with name `nodegroup` that will run `node1` and `node2` Kedro nodes inside and provide output `C` and second VertexAI node: `node3`. Additional MLflow node can be present if `kedro-mlflow` is used. Right now it is not possible to group it. If you feel you need that functionality search for/create an issue on [github page of the plugin](https://github.com/getindata/kedro-vertexai/issues).

This grouping class is used during pipeline translation at plugin pipeline generator. It implements interface of `NodeGrouper` class with `group` function, that accepts `pipeline.node_dependencies` and returns `Grouping`. `Grouping` is a `dataclass` with two dictionaries:
- `node_mapping` - which defines names of groups and says which sets of nodes are part of a given group
- `dependencies` - which defines child-parent relation of all groups in `node_mapping`.
`Grouping` class also validates dependencies upon creation to check whether the grouping is valid. That means it does not introduce a cycle in dependencies graph.

````{warning}
Make sure that all nodes in pipeline have names and their names are unique within the pipeline when using this feature, as grouping class and VertexAI nodes naming depend on it.
````

### Example

Here you can see how standard spaceflights changes after enabling the grouping feature configured with `TagNodeGrouper`, when using the following tagging (view from kedro viz):

![Vertex AI Pipeline](grouped_kedro_viz.png)

We get the following result:

![Vertex AI Pipeline](grouping_visualisation.png)


## Resources configuration

Optional `resources` and `node_selectors` sections enable adjustment of the resources reservations and limits for the
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b70081c

Please sign in to comment.