Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better download error handling #44

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
1780781
strip whitespace on job name. Remove span attrs
msarahan Feb 4, 2025
3f4d1af
handle downloads more quietly
msarahan Feb 6, 2025
30117eb
allow clone in load-then-clone regardless of telemetry enabled
msarahan Feb 6, 2025
686ae1b
debug telemetry enabled
msarahan Feb 6, 2025
c3416f3
skip unzip if file does not exist
msarahan Feb 6, 2025
149a203
pass GITHUB_TOKEN to shared actions; split attr upload
msarahan Feb 6, 2025
6d834c4
debugging output for sccache stats
msarahan Feb 10, 2025
6ebd7ef
use branch
msarahan Feb 10, 2025
b68f94e
add shell
msarahan Feb 10, 2025
4b9cde5
remove empty step
msarahan Feb 10, 2025
34b998d
move name script in setup
msarahan Feb 10, 2025
455a37e
skip telemetry steps when load-then-clone fails to download file
msarahan Feb 10, 2025
0a3c309
try to use nodejs for env var download
msarahan Feb 10, 2025
cea7a37
use pnpm
msarahan Feb 10, 2025
5d7abb5
pnpm versions stuff
msarahan Feb 10, 2025
ed54688
can't use pnpm action-setup
msarahan Feb 10, 2025
fc1b3f9
remove cache for now
msarahan Feb 10, 2025
e4f7e9d
revise npm version strings
msarahan Feb 10, 2025
f00bb93
revise npm version strings
msarahan Feb 10, 2025
e165fac
node to 20. call github-script action
msarahan Feb 10, 2025
c9286c1
change to getArtifact method
msarahan Feb 10, 2025
7561ad8
go back to only attempting download if run_attempt is 1
msarahan Feb 10, 2025
eb50111
download
msarahan Feb 10, 2025
c4a16aa
add artifact subpath
msarahan Feb 10, 2025
a4d1a74
go back to only attempting download if run_attempt is 1
msarahan Feb 10, 2025
6000a0c
shell
msarahan Feb 10, 2025
3f1bf40
don't use TELEMETRY_ENABLED in dispatch setup
msarahan Feb 10, 2025
f191813
can't use vars.ANYTHING in actions
msarahan Feb 10, 2025
3dde07a
remove GH_TOKEN where not needed
msarahan Feb 10, 2025
e206d6c
restore clone for stashing artifacts
msarahan Feb 10, 2025
4e15564
remove run_attempt condition from actions
msarahan Feb 10, 2025
056cbf7
add missing stash action
msarahan Feb 10, 2025
243d700
adding support for sccache file
msarahan Feb 11, 2025
986774e
tweak send_trace.py for folder structure
msarahan Feb 11, 2025
07f61c0
attrs -> artifacts
msarahan Feb 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 0 additions & 32 deletions telemetry-dispatch-load-base-env-vars/action.yml

This file was deleted.

34 changes: 26 additions & 8 deletions telemetry-dispatch-setup/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ description: |
current job, so that this metadata can be associated with spans during the final
parsing of job metadata.

Obtains GitHub Actions job list and matches current job using runner name and attempt number.

This action should be called at the beginning of child workflows, generally as the first
step in any job other than computing the matrix.

Expand All @@ -15,13 +17,29 @@ inputs:
runs:
using: 'composite'
steps:
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@main
# overrides loaded value
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@better-download-error-handling
- name: Creating folder for job-created telemetry artifacts
shell: bash
run: mkdir -p telemetry-artifacts
- uses: ./shared-actions/telemetry-impls/github-actions-job-info
- shell: bash
run:
echo JOB_ID="$(cat job_info.json | jq -r '.id')" >> ${GITHUB_ENV};
# overrides loaded value.
- name: Set OTEL_SERVICE_NAME from job
uses: ./shared-actions/telemetry-impls/set-otel-service-name
- name: Store attributes to use as metadata when creating spans
# This also sets OTEL_RESOURCE_ATTRIBUTES, for any subsequent steps
# in the calling workflow that might use it.
uses: ./shared-actions/telemetry-impls/stash-job-attributes
with:
extra_attributes: ${{ inputs.extra_attributes }}
- name: Add attribute metadata beyond the stashed basic stuff
shell: bash
run:
attributes="${OTEL_RESOURCE_ATTRIBUTES}";
labels="$(jq -r '.labels | join(" ")' job_info.json)";
if [ "${labels}" != "" ]; then
attributes="${attributes},rapids.labels=${labels}";
fi;
if [ "${{ inputs.extra_attributes }}" != "" ]; then
attributes="${attributes},${{ inputs.extra_attributes }}";
fi;
attributes=$(echo "${attributes}" | sed 's/^,//');
attributes=$(echo "${attributes}" | sed 's/,$//');
attributes=$(echo "${attributes}" | sed -r "s/(git.job_url=[^,]+)/\1\/job\/${JOB_ID}/");
echo OTEL_RESOURCE_ATTRIBUTES="${attributes}" >> ${GITHUB_ENV};
16 changes: 16 additions & 0 deletions telemetry-dispatch-stash-job-artifacts/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: dispatch-stash-job-artifacts
description: |
Clones a particular branch/ref of a shared-actions repo, then
call the stash-artifacts implementation script, which writes
some environment variables so that downstream jobs can refer to them.

Inputs here are all assumed to be env vars set outside of this script.
Set them in your main repo's workflows (export to ${GITHUB_ENV}!!)

runs:
using: 'composite'
steps:
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@better-download-error-handling
# Stash current job's OTEL_RESOURCE_ATTRIBUTES and any files in the telemetry-artifacts directory
- name: Stash job artifacts
uses: ./shared-actions/telemetry-impls/stash-job-artifacts
20 changes: 0 additions & 20 deletions telemetry-dispatch-stash-job-attributes/action.yml

This file was deleted.

9 changes: 2 additions & 7 deletions telemetry-dispatch-summarize/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,13 @@ description: |
This action is run in a final job on the top-level workflow, after all other
jobs are completed. This action downloads the JSON records of all jobs from
the current run. It then associates metadata records that were uploaded with
the telemetry-dispatch-stash-job-attributes action with jobs. This is
the telemetry-dispatch-stash-job-artifacts action with jobs. This is
effectively label metadata. Finally, this action creates OpenTelemetry spans
with the timing and label metadata, and sends it to the configured Tempo
endpoint (or forwarder).

runs:
using: 'composite'
steps:
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@main
if: ${{ github.run_attempt == '1' }}
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@better-download-error-handling
- uses: ./shared-actions/telemetry-impls/summarize
if: ${{ github.run_attempt == '1' }}
- shell: bash
run: echo "Skipping telemetry summary on rerun jobs."
if: ${{ github.run_attempt != '1' }}
4 changes: 0 additions & 4 deletions telemetry-impls/load-base-env-vars/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,6 @@ description: |
runs:
using: 'composite'
steps:
- name: Download base environment variables file
uses: actions/download-artifact@v4
with:
name: telemetry-tools-env-vars
- name: Set environment variables from file into GITHUB_ENV
shell: bash
# Only set the env var if it is not already set
Expand Down
7 changes: 4 additions & 3 deletions telemetry-impls/load-then-clone/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ runs:
uses: actions/download-artifact@v4
with:
name: telemetry-tools-env-vars
path: telemetry-artifacts
# We can't use ./telemetry-implementation/load-base-env-vars here
# because at this point we have not cloned the repo.
- name: Set environment variables from file into GITHUB_ENV
Expand All @@ -38,10 +39,10 @@ runs:
else
echo "Load base env info: ignoring new value for "${env_var_name}" in loading base env vars. It is already set to "${!env_var_name}"." >&2;
fi
done <telemetry-env-vars
done <telemetry-artifacts/telemetry-env-vars
- name: Clone shared-actions repo with loaded env vars
uses: actions/checkout@v4
with:
repository: ${{ env.SHARED_ACTIONS_REPO }}
ref: ${{ env.SHARED_ACTIONS_REF }}
repository: ${{ env.SHARED_ACTIONS_REPO || 'rapidsai/shared-actions' }}
ref: ${{ env.SHARED_ACTIONS_REF || 'main' }}
path: ./shared-actions
23 changes: 23 additions & 0 deletions telemetry-impls/stash-job-artifacts/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

name: stash-job-artifacts
description: |
Saves and uploads a file with telemetry attributes that should be attached to spans from this run.

We stash only the attributes here because we retrieve the rest of the timing
info later. We get info for all jobs at once, so we wait to retrieve that info
at the very end of the top-level job.

runs:
using: 'composite'
steps:
- name: Write attributes to file, one per line
shell: bash
run:
IFS=, read -ra values <<< "$OTEL_RESOURCE_ATTRIBUTES";
printf "%s\n" "${values[@]}" > telemetry-artifacts/attrs;

- name: Upload attr file and any other files
uses: actions/upload-artifact@v4
with:
name: telemetry-tools-artifacts-${{ env.JOB_ID }}
path: telemetry-artifacts
50 changes: 0 additions & 50 deletions telemetry-impls/stash-job-attributes/action.yml

This file was deleted.

3 changes: 2 additions & 1 deletion telemetry-impls/summarize/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,12 @@ runs:
with:
path: telemetry-artifacts
pattern: telemetry-tools-*
merge-multiple: true
merge-multiple: false

- name: Run parse and send trace/spans to endpoint
shell: bash
run: |
ls -lR telemetry-artifacts
timeout 5m python3 ./shared-actions/telemetry-impls/summarize/send_trace.py
- name: Clean up attributes artifacts from all jobs
uses: ./shared-actions/telemetry-impls/clean-up-artifacts
Expand Down
Loading