Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disable plugins #569

Merged
merged 6 commits into from
Feb 2, 2024

Conversation

adrianchiris
Copy link
Collaborator

@adrianchiris adrianchiris commented Dec 31, 2023

Add option to disable sriov network config daemon plugins

some plugins perform vendor specific firmware configuration
to enable SR-IOV (e.g mellanox plugin). certain deployment environments may prefer to perform such configuration
once during node provisioning, while ensuring the configuration will be compatible with any sriov network node policy
defined for the particular environment. This will reduce or completely eliminate the need for reboot of nodes during SR-IOV
configurations by the operator.

This can be done by setting SriovOperatorConfig default CR spec.disablePlugins with the list of desired plugins
to disable.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Dec 31, 2023

Pull Request Test Coverage Report for Build 7744576897

  • -64 of 124 (48.39%) changed or added relevant lines in 11 files are covered.
  • 6 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.9%) to 25.912%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/plugins/intel/intel_plugin.go 0 2 0.0%
pkg/daemon/daemon.go 6 10 60.0%
pkg/plugins/mellanox/mellanox_plugin.go 0 4 0.0%
pkg/daemon/plugin.go 34 39 87.18%
pkg/plugins/virtual/virtual_plugin.go 0 6 0.0%
pkg/plugins/k8s/k8s_plugin.go 0 9 0.0%
api/v1/zz_generated.deepcopy.go 5 18 27.78%
pkg/plugins/generic/generic_plugin.go 4 25 16.0%
Files with Coverage Reduction New Missed Lines %
controllers/sriovnetworkpoolconfig_controller.go 6 69.7%
Totals Coverage Status
Change from base Build 7742637229: 0.9%
Covered Lines: 2913
Relevant Lines: 11242

💛 - Coveralls

@SchSeba
Copy link
Collaborator

SchSeba commented Jan 1, 2024

general question do we want to have this as part of SriovOperatorConfig CR instead of another env variable in the operator? :)

@adrianchiris
Copy link
Collaborator Author

adrianchiris commented Jan 1, 2024

general question do we want to have this as part of SriovOperatorConfig CR instead of another env variable in the operator?

im not sure, i see this decision (deciding if/which plugins to disable) is made during deployment time and should not change.

moreover deleting sriov operator config CR will trigger creation of the default config which will have no plugins disabled by default which may be problematic.

maybe we need to revisit the creation of default sriov operator config CR ?

@SchSeba
Copy link
Collaborator

SchSeba commented Jan 1, 2024

I am not fully familiar with helm but can't we request a CR also when we do the helm install?

@bn222
Copy link
Collaborator

bn222 commented Jan 3, 2024

im not sure, i see this decision (deciding if/which plugins to disable) is made during deployment time and should not change.

Even if the decision is not made, I also prefer to have it be part of the CR since that would cover more use cases. It feels cleaner.

@adrianchiris
Copy link
Collaborator Author

I also prefer to have it be part of the CR since that would cover more use cases.

what other use-cases ?

@bn222
Copy link
Collaborator

bn222 commented Jan 3, 2024

I also prefer to have it be part of the CR since that would cover more use cases.

what other use-cases ?

Suppose the customer decides that they want to disable some plugin, then at a later point in time, they change their mind.

@adrianchiris
Copy link
Collaborator Author

adrianchiris commented Jan 3, 2024

Suppose the customer decides that they want to disable some plugin, then at a later point in time, they change their mind.

then IMO they need to redeploy. our (NVIDIA) use case that it should not change, depends on specific datacenter configuration, moreover accidently changing it may have catastrophic effect to workers (firmware parameters resetting).

my concern is creating the default operator config obj in the operator. if you are ok with dropping its creation, we can set this in operator config CR.

we discussed this in community meeting. Sebastian needs to double check internally.

allowedDisablePlugins := map[string]struct{}{"mellanox": {}}
for _, p := range startOpts.disabledPlugins {
if _, ok := allowedDisablePlugins[p]; !ok {
return fmt.Errorf("%s plugin is cannot be disabled", p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack


| Name | Type | Default | description |
| ---- | ---- | ------- | ----------- |
| `sriovConfigDaemon.disablePlugins` | []string | `nil` | List of plugins to disable, currently only `mellanox` plugin is allowed |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we add "use-systemd" now that you're adding this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure, that one can be configured during runtime.
this one is during deployment time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, didn't notice that this readme is for the helm chart

#sriovConfigDaemon:
# disablePlugins:
# - mellanox
# - intel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove intel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

deployment/sriov-network-operator/templates/operator.yaml Outdated Show resolved Hide resolved
pkg/daemon/plugin.go Outdated Show resolved Hide resolved
@@ -43,6 +43,29 @@ import (
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/version"
)

// stringList is a list of strings, implements pflag.Vaue interface
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// stringList is a list of strings, implements pflag.Vaue interface
// stringList is a list of strings, implements pflag.Value interface

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed will update PR soon

@@ -81,6 +106,14 @@ func runStartCmd(cmd *cobra.Command, args []string) error {
startOpts.nodeName = name
}

// TODO(adrianc): move allowed disable plugins to globals pkg
allowedDisablePlugins := map[string]struct{}{"mellanox": {}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowedDisablePlugins -> DisableablePlugins

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about deactivatablePlugins or activatablePlugins if we switch to a positive instead (which is maybe better to avoid double negatives and have shorter names throughout).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed, will upload PR soon

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

README.md Outdated
configurations by the operator.

This can be done by providing the names of plugins to disable as a comma separated list of strings via
`SRIOV_NETWORK_CONFIG_DAEMON_DISABLE_PLUGINS` environment variable to the operator deployment or using
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't know what will be better, maybe _DISABLED_PLUGINS

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer to keep it the same name as the flag (disable-plugins)

README.md Outdated
defined for the particular environment. This will reduce or completely eliminate the need for reboot of nodes during SR-IOV
configurations by the operator.

This can be done by providing the names of plugins to disable as a comma separated list of strings via
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "comma-separated" here and below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -66,6 +90,7 @@ func init() {
startCmd.PersistentFlags().StringVar(&startOpts.kubeconfig, "kubeconfig", "", "Kubeconfig file to access a remote cluster (testing only)")
startCmd.PersistentFlags().StringVar(&startOpts.nodeName, "node-name", "", "kubernetes node name daemon is managing")
startCmd.PersistentFlags().BoolVar(&startOpts.systemd, "use-systemd-service", false, "use config daemon in systemd mode")
startCmd.PersistentFlags().VarP(&startOpts.disabledPlugins, "disable-plugins", "", "comma separated list of plugins")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma separated => comma-separated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@e0ne
Copy link
Collaborator

e0ne commented Jan 12, 2024

my concern is creating the default operator config obj in the operator. if you are ok with dropping its creation, we can set this in operator config CR.

+1 on this. this option isn't supposed to be changed during runtime

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris
Copy link
Collaborator Author

@SchSeba PTAL

@mlguerrero12
Copy link
Contributor

please rebase

LGTM

prevK8sPluginFn := K8sPlugin

BeforeEach(func() {
vars.ClusterType = consts.ClusterTypeKubernetes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you reset global variables to its origin value?

Suggested change
vars.ClusterType = consts.ClusterTypeKubernetes
backupClusterType := vars.ClusterType
vars.ClusterType = consts.ClusterTypeKubernetes
DeferCleanup(func() {
vars.ClusterType = backupClusterType
})

Copy link
Collaborator Author

@adrianchiris adrianchiris Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

@@ -45,6 +45,8 @@ type SriovOperatorConfigSpec struct {
ConfigurationMode ConfigurationModeType `json:"configurationMode,omitempty"`
// Flag to enable Container Device Interface mode for SR-IOV Network Device Plugin
UseCDI bool `json:"useCDI,omitempty"`
// DisablePlugins is a list of sriov-network-config-daemon plugins to disable
DisablePlugins []string `json:"disablePlugins,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add a validation at the api level

// +kubebuilder:validation:Enum={"mellanox"}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. thx

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried this out, it doesnt work for lists just for scalar values

spec.disablePlugins: Unsupported value: []interface {}{"mellanox"}: supported values: "mellanox"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, perhaps this kubernetes-sigs/kubebuilder#2812?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works. PR updated.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Contributor

@mlguerrero12 mlguerrero12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Failing tests are not related. They are also failing in other PRs.
This seems to help #600

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris
Copy link
Collaborator Author

rebased. @zeeke PTAL :)

- remove _plugin suffix from plugin names as it doesnt
  add any important information.
- rename logs

Signed-off-by: adrianc <adrianc@nvidia.com>
This better reflects the meaning of whats being done.
In addition it serves as prep work to support disable plugins
as having both enabledPlugins and disabledPlugins attr in Daemon
is confusing.

Signed-off-by: adrianc <adrianc@nvidia.com>
- Add --disable-plugins CLI flag
- parse and check flag values and pass to daemon
- extend plugin load logic to skip disabled plugins

Signed-off-by: adrianc <adrianc@nvidia.com>
- Make fake plugin name configurable via attribute
- adjust daemon test code
- add daemon plugins tests to cover plugin loading logic

Signed-off-by: adrianc <adrianc@nvidia.com>
- Extend sriovOperatorConfig CRD with disablePlugins list attribute
  which contains list of plugins to disable in
  sriov-network-config-daemon
- sriovOperatorConfig controller: Update template data
- Set disable-plugin flag in daemonset yaml
- Add tests

Signed-off-by: adrianc <adrianc@nvidia.com>
update documentation for disable plugin feature

Signed-off-by: adrianc <adrianc@nvidia.com>
Copy link

github-actions bot commented Feb 1, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeeke zeeke merged commit e409337 into k8snetworkplumbingwg:master Feb 2, 2024
11 checks passed
zeeke added a commit to zeeke/sriov-network-operator that referenced this pull request Feb 22, 2024
Changes to CustomResourceDefinition have been introduced by
- k8snetworkplumbingwg/sriov-network-operator#497
- k8snetworkplumbingwg/sriov-network-operator#569

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
SchSeba pushed a commit to SchSeba/sriov-network-operator that referenced this pull request May 7, 2024
Changes to CustomResourceDefinition have been introduced by
- k8snetworkplumbingwg/sriov-network-operator#497
- k8snetworkplumbingwg/sriov-network-operator#569

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@adrianchiris adrianchiris deleted the add-disable-plugins branch July 16, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants