Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/multi loader logs collection #598

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

nosnelmil
Copy link
Contributor

Summary

Extends multi-loader by collecting key logs from nodes in the cluster for the Knative platform. Users can optionally collect the following logs:

  • TOP (resource usage metrics)
  • Prometheus snapshots
  • Logs from the Activator Pod
  • Logs from the Autoscaler Pod

Implementation Notes ⚒️

  • Added an additional Metrics field in the multi-loader config, accepting an array with any of the following values: top, prometheus, activator, autoscaler.
  • Introduced optional fields: MasterNode, ActivatorNode, AutoscalerNode, and WorkerNodes to allow users to manually specify IPs instead of relying on multi-loader to determine them (mostly unnecessary in typical scenarios).
  • Uses kubectl to automatically determine node IPs and classify them based on their roles.
  • Resets TOP metrics for all nodes before starting any experiment.
  • Collects Activator Pod logs from:
    /var/log/pods/knative-serving_activator-*/activator/*
  • Collects Autoscaler Pod logs from:
    /var/log/pods/knative-serving_autoscaler-*/autoscaler/*
  • Copies Prometheus snapshots by first triggering a snapshot via the Prometheus API on the master node and then retrieving the generated snapshot.
  • Additionally, log collection logic runs during multi-loader dry run to:
    • Validate that identified IPs are reachable.
    • Ensure SSH access and necessary permissions.
    • Execute log retrieval commands to detect potential errors.
    • Delete any collected logs after validation, as no experiments have been executed yet.

External Dependencies 🍀

  • N/A

Breaking API Changes ⚠️

  • N/A

@nosnelmil nosnelmil force-pushed the feature/multi-loader-adv-logs branch 2 times, most recently from 3dfaaaf to d5d74ac Compare February 4, 2025 03:16
@nosnelmil nosnelmil marked this pull request as draft February 4, 2025 03:17
Signed-off-by: Lenson <nosnelmil@gmail.com>
@nosnelmil nosnelmil force-pushed the feature/multi-loader-adv-logs branch from 35c1870 to 4320621 Compare February 6, 2025 13:57
Signed-off-by: Lenson <nosnelmil@gmail.com>

add node discovery validators

Signed-off-by: Lenson <nosnelmil@gmail.com>

add collect TOP metric functions

Signed-off-by: Lenson <nosnelmil@gmail.com>

add multi-loader metric_manager

Signed-off-by: Lenson <nosnelmil@gmail.com>

add autoscaler log collection

Signed-off-by: Lenson <nosnelmil@gmail.com>

add activator log collection

Signed-off-by: Lenson <nosnelmil@gmail.com>

add prometh log collection

Signed-off-by: Lenson <nosnelmil@gmail.com>

refactor metric manager contants

Signed-off-by: Lenson <nosnelmil@gmail.com>

minor fix for node discovery

Signed-off-by: Lenson <nosnelmil@gmail.com>

fix node discovery

Signed-off-by: Lenson <nosnelmil@gmail.com>

minor fix

Signed-off-by: Lenson <nosnelmil@gmail.com>

minor fix

Signed-off-by: Lenson <nosnelmil@gmail.com>

add logs for prometh

Signed-off-by: Lenson <nosnelmil@gmail.com>

add pause between prometh collection

Signed-off-by: Lenson <nosnelmil@gmail.com>

update wait time

Signed-off-by: Lenson <nosnelmil@gmail.com>

update condition for node discovery

Signed-off-by: Lenson <nosnelmil@gmail.com>

update logging

Signed-off-by: Lenson <nosnelmil@gmail.com>
Signed-off-by: Lenson <nosnelmil@gmail.com>

update kind ssh update script

Signed-off-by: Lenson <nosnelmil@gmail.com>
Signed-off-by: Lenson <nosnelmil@gmail.com>

fix log collection test

commit a05990d
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 15:39:39 2025 +0800

    update test trigger

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 3edb3b4
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 15:33:06 2025 +0800

    update test

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 56a0f7d
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 15:18:40 2025 +0800

    fix

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 67c520d
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 15:06:20 2025 +0800

    fix

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 48ff845
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 14:46:29 2025 +0800

    test'

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 295c761
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 14:45:35 2025 +0800

    add adv log collection tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 8469bdb
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 14:45:05 2025 +0800

    update logging

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 10e295a
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 14:44:42 2025 +0800

    update kind ssh update script

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit c56a9d8
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 13:19:27 2025 +0800

    add KinD ssh setup script

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit bf9a804
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Feb 3 10:31:55 2025 +0800

    update condition for node discovery

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit b3f078b
Author: Lenson <nosnelmil@gmail.com>
Date:   Fri Jan 31 18:35:03 2025 +0800

    add multi loader log collection

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add node discovery validators

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add collect TOP metric functions

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi-loader metric_manager

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add autoscaler log collection

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add activator log collection

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add prometh log collection

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    refactor metric manager contants

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    minor fix for node discovery

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    fix node discovery

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    minor fix

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    minor fix

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add logs for prometh

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add pause between prometh collection

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update wait time

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 9bac3c4
Author: Lenson <nosnelmil@gmail.com>
Date:   Tue Jan 21 13:00:50 2025 +0800

    update multi loader docs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update multi-loader docs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit bfd17be
Author: Lenson <nosnelmil@gmail.com>
Date:   Mon Jan 20 16:30:13 2025 +0800

    minor multi loader fix

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    fix incorrect retry logging

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    remove iat and generated cli args

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    remove make clean from clean up

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 91042aa
Author: Lenson <nosnelmil@gmail.com>
Date:   Thu Jan 16 15:53:19 2025 +0800

    update tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update multi loader e2e tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    revert setup.cfg

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    chmod script

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update unit tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    fix e2e test

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 69c3c3a
Author: Lenson <nosnelmil@gmail.com>
Date:   Tue Dec 31 11:49:55 2024 +0800

    add failfast flag

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update failfast flag description

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update comments

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update wordlist with multiloader specific words

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    simplify run experiment logic

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    refactor partial experiment naming

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    fix wrong indexing

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add progress in logging

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit fc3ad98
Author: Lenson <nosnelmil@gmail.com>
Date:   Sun Nov 17 14:07:35 2024 +0800

    refactor multi loader

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi-loader tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update test

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    refactor multi-loader tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add loader experiment

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update logs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update log verbosity

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update logs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update logs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    rename multiloader driver to runner

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    refactor common files to multiloader folder

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    refactor multiloader functions

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    rename createNewStudy function name

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    fix formatting

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    remove extra features

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    remove extra features

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add validation for platform

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit ca5e2ad
Author: Lenson <nosnelmil@gmail.com>
Date:   Sat Nov 16 18:49:35 2024 +0800

    add multi loader documentation

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update docs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    fix docs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update documentation

    Signed-off-by: Lenson <nosnelmil@gmail.com>

commit 3c7e6b5
Author: Lenson <nosnelmil@gmail.com>
Date:   Sat Nov 16 12:36:43 2024 +0800

    add multi-loader

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi-loader config reader

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi loader base

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi loader base

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add node group struct

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi loader runner

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    refactor multi loader config

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi loader config validators

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add knative specific config enricher

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add additional knative platform type

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add base runner entry point

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    refactor multi loader config

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update multi loader config struct

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update unpack study doc

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add unpack study

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add prepare experiment

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update experiment config temp path

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add run loader function

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update log parser

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update log parser

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update log parser

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add clean up function

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add logs to indicate run status

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    expose entry points for multi loader runner

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi loader runner execution

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update default multi loader config path

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add cpu limit validator

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    remove extra knative feature

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    remove knative extra features

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add multi loader tests

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add basic config

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update basic config

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update basic config

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    add basic configs

    Signed-off-by: Lenson <nosnelmil@gmail.com>

    update base config

    Signed-off-by: Lenson <nosnelmil@gmail.com>

Signed-off-by: Lenson <nosnelmil@gmail.com>
Signed-off-by: Lenson <nosnelmil@gmail.com>
Signed-off-by: Lenson <nosnelmil@gmail.com>
@nosnelmil nosnelmil force-pushed the feature/multi-loader-adv-logs branch from 55c0a36 to 01ad794 Compare February 11, 2025 02:15
Signed-off-by: Lenson <nosnelmil@gmail.com>
Signed-off-by: Lenson <nosnelmil@gmail.com>
Signed-off-by: Lenson <nosnelmil@gmail.com>
@nosnelmil nosnelmil force-pushed the feature/multi-loader-adv-logs branch from c5304d8 to da4d31c Compare February 11, 2025 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant