Skip to content

Commit

Permalink
Merge branch 'main' into feat/daemon-retry-strategy
Browse files Browse the repository at this point in the history
  • Loading branch information
MenD32 authored Jan 28, 2025
2 parents e904c14 + 3ae8b31 commit b45cb7b
Show file tree
Hide file tree
Showing 53 changed files with 1,052 additions and 456 deletions.
4 changes: 2 additions & 2 deletions .devcontainer/pre-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ sudo mv ./kubectl /usr/local/bin/kubectl
kubectl cluster-info

# install kit
curl -q https://raw.githubusercontent.com/kitproj/kit/main/install.sh | sh
make kit

# install protocol buffer compiler (protoc)
sudo apt update
Expand All @@ -25,7 +25,7 @@ sudo chown vscode:vscode /home/vscode/go/src || true
sudo chown vscode:vscode /home/vscode/go/src/github.com || true

# download dependencies and do first-pass compile
CI=1 kit pre-up
kit build

# Patch CoreDNS to have host.docker.internal inside the cluster available
kubectl get cm coredns -n kube-system -o yaml | sed "s/ NodeHosts: |/ NodeHosts: |\n `grep host.docker.internal /etc/hosts`/" | kubectl apply -f -
11 changes: 1 addition & 10 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -532,16 +532,7 @@ dist/argosay:

.PHONY: kit
kit: Makefile
ifeq ($(shell command -v kit),)
ifeq ($(shell uname),Darwin)
brew tap kitproj/kit --custom-remote https://github.com/kitproj/kit
brew install kit
else
@echo "Downloading Kit"
curl -fsL --retry 99 "https://github.com/kitproj/kit/releases/download/v0.1.8/kit_0.1.8_$$(uname)_$$(uname -m | sed 's/aarch64/arm64/').tar.gz" | sudo tar -C /usr/local/bin -xzf - kit
endif
endif

go install github.com/kitproj/kit@v0.1.79

.PHONY: start
ifeq ($(RUN_MODE),local)
Expand Down
7 changes: 7 additions & 0 deletions api/jsonschema/schema.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions api/openapi-spec/swagger.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions docs/executor_swagger.md
Original file line number Diff line number Diff line change
Expand Up @@ -3807,6 +3807,7 @@ of the first container processes are calculated.
|------|------|---------|:--------:| ------- |-------------|---------|
| activeDeadlineSeconds | [IntOrString](#int-or-string)| `IntOrString` | | | | |
| affinity | [Affinity](#affinity)| `Affinity` | | | | |
| annotations | map of string| `map[string]string` | | | Annotations is a list of annotations to add to the template at runtime | |
| archiveLocation | [ArtifactLocation](#artifact-location)| `ArtifactLocation` | | | | |
| automountServiceAccountToken | boolean| `bool` | | | AutomountServiceAccountToken indicates whether a service account token should be automatically mounted in pods.</br>ServiceAccountName of ExecutorConfig must be specified if this value is false. | |
| container | [Container](#container)| `Container` | | | | |
Expand Down
1 change: 1 addition & 0 deletions docs/fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -1747,6 +1747,7 @@ Template is a reusable and composable unit of execution in a workflow
|:----------:|:----------:|---------------|
|`activeDeadlineSeconds`|[`IntOrString`](#intorstring)|Optional duration in seconds relative to the StartTime that the pod may be active on a node before the system actively tries to terminate the pod; value must be positive integer This field is only applicable to container and script templates.|
|`affinity`|[`Affinity`](#affinity)|Affinity sets the pod's scheduling constraints Overrides the affinity set at the workflow level (if any)|
|`annotations`|`Map< string , string >`|Annotations is a list of annotations to add to the template at runtime|
|`archiveLocation`|[`ArtifactLocation`](#artifactlocation)|Location in which all files related to the step will be stored (logs, artifacts, etc...). Can be overridden by individual items in Outputs. If omitted, will use the default artifact repository location configured in the controller, appended with the <workflowname>/<nodename> in the key.|
|`automountServiceAccountToken`|`boolean`|AutomountServiceAccountToken indicates whether a service account token should be automatically mounted in pods. ServiceAccountName of ExecutorConfig must be specified if this value is false.|
|`container`|[`Container`](#container)|Container is the main container image to run in the pod|
Expand Down
28 changes: 28 additions & 0 deletions docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,7 @@ Metrics for the [Four Golden Signals](https://sre.google/sre-book/monitoring-dis
#### `cronworkflows_concurrencypolicy_triggered`

A counter of the number of times a CronWorkflow has triggered its `concurrencyPolicy` to limit the number of workflows running.

| attribute | explanation |
|----------------------|----------------------------------------------------------------------------------|
| `name` | ⚠️ The name of the CronWorkflow |
Expand All @@ -257,6 +258,7 @@ A counter of the number of times a CronWorkflow has triggered its `concurrencyPo

A counter of the total number of times a CronWorkflow has been triggered.
Suppressed runs due to `concurrencyPolicy: Forbid` will not be counted.

| attribute | explanation |
|-------------|-------------------------------------------|
| `name` | ⚠️ The name of the CronWorkflow |
Expand All @@ -267,6 +269,7 @@ Suppressed runs due to `concurrencyPolicy: Forbid` will not be counted.
Incidents of deprecated feature being used.
Deprecated features are [explained here](deprecations.md).
🚨 This counter may go up much more than once for a single use of the feature.

| attribute | explanation |
|-------------|---------------------------------------|
| `feature` | The name of the feature used |
Expand All @@ -282,6 +285,7 @@ Deprecated features are [explained here](deprecations.md).
#### `error_count`

A counter of certain errors incurred by the controller by cause.

| attribute | explanation |
|-----------|------------------------|
| `cause` | The cause of the error |
Expand All @@ -297,6 +301,7 @@ The currently tracked specific errors are
A gauge of the number of workflows currently in the cluster in each phase.
The `Running` count does not mean that a workflows pods are running, just that the controller has scheduled them.
A workflow can be stuck in `Running` with pending pods for a long time.

| attribute | explanation |
|-----------|----------------------------|
| `status` | Boolean: `true` or `false` |
Expand All @@ -308,11 +313,13 @@ A gauge indicating if this Controller is the [leader](high-availability.md#workf

- `1` if leader or in standalone mode via [`LEADER_ELECTION_DISABLE=true`](environment-variables.md#controller).
- `0` otherwise, indicating that this controller is a standby that is not currently running workflows.

This metric has no attributes.

#### `k8s_request_duration`

A histogram recording the API requests sent to the Kubernetes API.

| attribute | explanation |
|---------------|--------------------------------------------------------------------|
| `kind` | The kubernetes `kind` involved in the request such as `configmaps` |
Expand All @@ -325,6 +332,7 @@ This contains all the information contained in `k8s_request_total` along with ti
#### `k8s_request_total`

A counter of the number of API requests sent to the Kubernetes API.

| attribute | explanation |
|---------------|--------------------------------------------------------------------|
| `kind` | The kubernetes `kind` involved in the request such as `configmaps` |
Expand All @@ -336,6 +344,7 @@ This metric is calculable from `k8s_request_duration`, and it is suggested you j
#### `log_messages`

A count of log messages emitted by the controller by log level: `error`, `warn` and `info`.

| attribute | explanation |
|-----------|------------------------------|
| `level` | The log level of the message |
Expand All @@ -345,14 +354,17 @@ A count of log messages emitted by the controller by log level: `error`, `warn`
A histogram of durations of operations.
An operation is a single workflow reconciliation loop within the workflow-controller.
It's the time for the controller to process a single workflow after it has been read from the cluster and is a measure of the performance of the controller affected by the complexity of the workflow.

This metric has no attributes.

The environment variables `OPERATION_DURATION_METRIC_BUCKET_COUNT` and `MAX_OPERATION_TIME` configure the bucket sizes for this metric, unless they are specified using an `histogramBuckets` modifier in the `metricsConfig` block.

#### `pod_missing`

Incidents of pod missing.
A counter of pods that were not seen - for example they are by being deleted by Kubernetes.
You should only see this under high load.

| attribute | explanation |
|--------------------|----------------------------------------|
| `node_phase` | The phase that the pod's node was in |
Expand All @@ -363,6 +375,7 @@ You should only see this under high load.
#### `pod_pending_count`

Total number of pods that started pending by reason.

| attribute | explanation |
|-------------|----------------------------------------------|
| `reason` | Summary of the kubernetes Reason for pending |
Expand All @@ -373,13 +386,15 @@ Total number of pods that started pending by reason.
A gauge of the number of workflow created pods currently in the cluster in each phase.
It is possible for a workflow to start, but no pods be running (for example cluster is too busy to run them).
This metric sheds light on actual work being done.

| attribute | explanation |
|-----------|------------------------------|
| `phase` | The phase that the pod is in |

#### `pods_total_count`

Total number of pods that have entered each phase.

| attribute | explanation |
|-------------|----------------------------------|
| `phase` | The phase that the pod is in |
Expand All @@ -393,6 +408,7 @@ This is not directly controlled by the workflow controller, so it is possible fo

A counter of additions to the work queues inside the controller.
The rate of this shows how busy that area of the controller is

| attribute | explanation |
|--------------|-----------------------|
| `queue_name` | The name of the queue |
Expand All @@ -411,6 +427,7 @@ This and associated metrics are all directly sourced from the [client-go workque

A gauge of the current depth of the queues.
If these get large then the workflow controller is not keeping up with the cluster.

| attribute | explanation |
|--------------|-----------------------|
| `queue_name` | The name of the queue |
Expand All @@ -428,6 +445,7 @@ This and associated metrics are all directly sourced from the [client-go workque
#### `queue_duration`

A histogram of the time events in the queues are taking to be processed.

| attribute | explanation |
|--------------|-----------------------|
| `queue_name` | The name of the queue |
Expand All @@ -446,6 +464,7 @@ This and associated metrics are all directly sourced from the [client-go workque
#### `queue_latency`

A histogram of the time events in the queues are taking before they are processed.

| attribute | explanation |
|--------------|-----------------------|
| `queue_name` | The name of the queue |
Expand All @@ -464,6 +483,7 @@ This and associated metrics are all directly sourced from the [client-go workque
#### `queue_longest_running`

A gauge of the number of seconds that this queue's longest running processor has been running for.

| attribute | explanation |
|--------------|-----------------------|
| `queue_name` | The name of the queue |
Expand All @@ -481,6 +501,7 @@ This and associated metrics are all directly sourced from the [client-go workque
#### `queue_retries`

A counter of the number of times a message has been retried in the queue.

| attribute | explanation |
|--------------|-----------------------|
| `queue_name` | The name of the queue |
Expand All @@ -498,6 +519,7 @@ This and associated metrics are all directly sourced from the [client-go workque
#### `queue_unfinished_work`

A gauge of the number of queue items that have not been processed yet.

| attribute | explanation |
|--------------|-----------------------|
| `queue_name` | The name of the queue |
Expand All @@ -515,6 +537,7 @@ This and associated metrics are all directly sourced from the [client-go workque
#### `total_count`

A counter of workflows that have entered each phase for tracking them through their life-cycle, by namespace.

| attribute | explanation |
|-------------|-----------------------------------------|
| `phase` | The phase that the Workflow has entered |
Expand All @@ -523,6 +546,7 @@ A counter of workflows that have entered each phase for tracking them through th
#### `version`

Build metadata for this Controller.

| attribute | explanation |
|------------------|-------------------------------------------------------------------------------------------------------|
| `version` | The version of Argo |
Expand All @@ -537,6 +561,7 @@ Build metadata for this Controller.
#### `workers_busy_count`

A gauge of queue workers that are busy.

| attribute | explanation |
|---------------|-------------------|
| `worker_type` | The type of queue |
Expand All @@ -555,6 +580,7 @@ This and associated metrics are all directly sourced from the [client-go workque

A gauge of the number of workflows with different conditions.
This will tell you the number of workflows with running pods.

| attribute | explanation |
|-----------|----------------------------------------------------|
| `type` | The type of condition, currently only `PodRunning` |
Expand All @@ -565,6 +591,7 @@ This will tell you the number of workflows with running pods.
A histogram of the runtime of workflows using `workflowTemplateRef` only.
Counts both WorkflowTemplate and ClusterWorkflowTemplate usage.
Records time between entering the `Running` phase and completion, so does not include any time in `Pending`.

| attribute | explanation |
|-----------------|-------------------------------------------------------------|
| `name` | ⚠️ The name of the WorkflowTemplate/ClusterWorkflowTemplate. |
Expand All @@ -575,6 +602,7 @@ Records time between entering the `Running` phase and completion, so does not in

A counter of workflows using `workflowTemplateRef` only, as they enter each phase.
Counts both WorkflowTemplate and ClusterWorkflowTemplate usage.

| attribute | explanation |
|-----------------|-------------------------------------------------------------|
| `name` | ⚠️ The name of the WorkflowTemplate/ClusterWorkflowTemplate. |
Expand Down
1 change: 0 additions & 1 deletion docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,6 @@ rules:
- workflowtemplates
- clusterworkflowtemplates
- cronworkflows
- cronworkflows
- workflowtaskresults
verbs:
- get
Expand Down
66 changes: 66 additions & 0 deletions docs/walk-through/annotations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Annotations

Argo Workflows now supports annotations as a new field in workflow templates.

## Adding Annotations to a template

To add annotations to a workflow template, include the `annotations` field in template definition, for example:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: example-workflow-template
spec:
entrypoint: whalesay
templates:
- name: whalesay
annotations:
workflows.argoproj.io/display-name: "my-custom-display-name"
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
```

In this example, the annotation `workflows.argoproj.io/display-name` is used to change the node name in the UI to "my-custom-display-name".

## Annotation Templates

Annotations can also be created dynamically using parameters. This allows you to dynamically set annotation values based on input parameters.

Here is an example:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: templated-annotations-workflow
spec:
entrypoint: whalesay
arguments:
parameters:
- name: display-name
value: "default-display-name"
templates:
- name: whalesay
annotations:
workflows.argoproj.io/display-name: "{{inputs.parameters.display-name}}"
inputs:
parameters:
- name: display-name
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
```

In this example, the annotation `workflows.argoproj.io/display-name` is set using the `display-name` parameter. You can override this parameter when submitting the workflow to dynamically change the annotation value.

## Supported Annotation Types

Here is a table of all supported annotation types in Argo Workflows:

| Annotation Key | Description |
|----------------------------------------------|-----------------------------------------------------------------------------|
| `workflows.argoproj.io/display-name` | Changes the node name in the UI. |
2 changes: 2 additions & 0 deletions docs/walk-through/loops.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,8 @@ spec:

# This template is the same as in the previous example
- name: cat-os-release
annotations:
workflows.argoproj.io/display-name: "os-{{inputs.parameters.image}}-{{inputs.parameters.tag}}" # this sets a custom name for the node in the UI, based on the template's parameters
inputs:
parameters:
- name: image
Expand Down
Loading

0 comments on commit b45cb7b

Please sign in to comment.