Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add design doc for switchdev mode refactoring #559

Merged

Conversation

ykulazhenkov
Copy link
Collaborator

We need to refactor the implementation used for NICs in switchdev mode and align its behavior with the systemd
mode of the operator. The refactoring is required to simplify the development of the new switchdev-related
features for the sriov-network-operator.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@ykulazhenkov
Copy link
Collaborator Author

@adrianchiris @SchSeba @e0ne please, take a look

@coveralls
Copy link

coveralls commented Dec 14, 2023

Pull Request Test Coverage Report for Build 7475965250

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 23.306%

Totals Coverage Status
Change from base Build 7473924622: 0.05%
Covered Lines: 2298
Relevant Lines: 9860

💛 - Coveralls

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good @ykulazhenkov, added a few comments/questions.

doc/design/switchdev-refactoring.md Outdated Show resolved Hide resolved
doc/design/switchdev-refactoring.md Outdated Show resolved Hide resolved
doc/design/switchdev-refactoring.md Show resolved Hide resolved
doc/design/switchdev-refactoring.md Outdated Show resolved Hide resolved
doc/design/switchdev-refactoring.md Show resolved Hide resolved
@ykulazhenkov ykulazhenkov force-pushed the switchdev-refactoring-design branch from 1a27f69 to 0c6163a Compare January 2, 2024 09:07
Copy link

github-actions bot commented Jan 2, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! great wok @ykulazhenkov

@zeeke
Copy link
Member

zeeke commented Jan 3, 2024

Just dropping an idea that could simplify here:

At the moment, the field SriovOperatorConfig.configurationMode defines whether the host configuration is applied by the config-daemon pod or the systemd service.

We can make the pod apply the configuration despite the field value, which only enables the systemd service. The service does the same logic as the pod, but it does that early in the boot process.
Then, when the config-daemon pod starts, it doesn't make any changes to the host because everything is already in place (unless there are bugs)

This could remove some if/then/else statements and simplify some workflows. It can be addressed and discussed in a different PR/design_doc if deemed out of scope.

@ykulazhenkov
Copy link
Collaborator Author

Just dropping an idea that could simplify here:

At the moment, the field SriovOperatorConfig.configurationMode defines whether the host configuration is applied by the config-daemon pod or the systemd service.

We can make the pod apply the configuration despite the field value, which only enables the systemd service. The service does the same logic as the pod, but it does that early in the boot process. Then, when the config-daemon pod starts, it doesn't make any changes to the host because everything is already in place (unless there are bugs)

This could remove some if/then/else statements and simplify some workflows. It can be addressed and discussed in a different PR/design_doc if deemed out of scope.

I like the idea. This means that configurationMode daemon will mean "daemon only" and configurationMode systemd means "daemon+systemd service for early configuration." I believe this can certainly simplify the code, and the main reason for this is that it looks like we will not need to propagate the result of the execution of the systemd service to the daemon (currently, this is required to update the status in the API) because the daemon will always recheck the configuration of the host.

This refactoring can be easily implemented on top of the changes that are proposed in this PR, but I prefer to discuss them separately to reduce the scope a bit.

e0ne
e0ne previously approved these changes Jan 4, 2024
@e0ne
Copy link
Collaborator

e0ne commented Jan 4, 2024

Just dropping an idea that could simplify here:

At the moment, the field SriovOperatorConfig.configurationMode defines whether the host configuration is applied by the config-daemon pod or the systemd service.

We can make the pod apply the configuration despite the field value, which only enables the systemd service. The service does the same logic as the pod, but it does that early in the boot process. Then, when the config-daemon pod starts, it doesn't make any changes to the host because everything is already in place (unless there are bugs)

This could remove some if/then/else statements and simplify some workflows. It can be addressed and discussed in a different PR/design_doc if deemed out of scope.

I like this idea. We can add some note in this design doc as a further improvement action item but this is out of scope of propesed idea

@adrianchiris
Copy link
Collaborator

+1 on having systemd service "assisting" config daemon, in case systemd failed we shoud take care to report that (e.g via event on srionetworknodestate) and allow config daemon to align configuration.

that said, i do not think this is related to switchdev refactoring directly and can be added on top.
@zeeke if you agree, mind opening a separate issue for this one ?

@zeeke
Copy link
Member

zeeke commented Jan 4, 2024

Thanks for the quick feedback. I'll take care of starting a discussion on this

@mlguerrero12
Copy link
Contributor

@ykulazhenkov, I'm not familiar with the switchdev related code but I've seen that the k8s plugin has some config for that. This plugin is not mentioned in the proposal. Will this be affected? What's the impact?

@ykulazhenkov
Copy link
Collaborator Author

@ykulazhenkov, I'm not familiar with the switchdev related code but I've seen that the k8s plugin has some config for that. This plugin is not mentioned in the proposal. Will this be affected? What's the impact?

@mlguerrero12
k8s plugin is responsible for creation of bash based swithcdev systemd services
"Drop existing bash-based implementation which is used for NICs with switchdev configurationt" assumes that we will clean up this code + Openshift specific code in SriovNetworkPoolConfig controller which creates these services for Openshift

@mlguerrero12
Copy link
Contributor

cool, thanks for the clarification

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea!
but one think we need to cover is when we apply a policy and we are in systemd mode we must reboot the host because the user may need to vfs on boot .
also with your solution it me be a bit slower as we will apply the config (create the vfs with the daemon) then reboot the node systemd will configure it again and then daemon will just pass because everything is in place

Both services run bash scripts. The script from the first service is responsible for VFs creation and for
switching a NIC to the switchdev eSwitch mode. The script from the second service binds VFs to the required drivers.

If a NIC has _**switchdev**_ configuration, then `configurationMode` of the operator does not affect it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi think that is because the switchdev was only to support HWoffload right? we didn't had any other support in the operator because we don't have any CNI for it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But now we want to propose support for additional use cases for NICs in switchdev mode(with separate discussion and design docs). That is why we want to align switched configs with configurationMode and systemd.

The operator should behave for these configurations the same way as it does now.

Users using NICs with _**switchdev**_ configurations will need to explicitly set operator's
`configurationMode` to `systemd` if they expect the configuration of the NIC to happen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment here "to continue and support the hwoffload use-case"
because after we implement this design we should be able and configure switchdev even in daemon mode

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SchSeba can you elaborate what is hwoffload use-case ?

as i see it, for now its switching nic to switchdev and configuring sriov which after this enhancement can be done in either daemon or systemd mode,

for Openshift current use-case the latter will satisfy he requirement doesnt it ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed in community meeting. my question is answered.

hwoffload == setting up systemd service to enable hw-offload config in ovs.

Change in the operator's behavior: `configurationMode` option now have effect
on NICs with _**switchdev**_ configurations.

### Implementation Details/Notes/Constraints
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also consider to have a special controller to handle the case of HWoffload?
meaning switch the daemon to systemd mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also consider to have a special controller to handle the case of HWoffload? meaning switch the daemon to systemd mode?

Sorry, I didn't get the point. If you want to configure NIC to switchdev on boot, you should simply configure the sriov operator to systemd mode(by changing the OperatorConfig CR, the same way as it works now). The generic plugin will be extended to do the required actions for HW offload (tc offload enabling, vdpa device creation, etc)

Copy link
Collaborator

@adrianchiris adrianchiris Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed in community meeting: its best to reduce complexity and have users explicitly setting the daemon to systemd.

users that use hwoffload (we dont know if/how-many there are) can:

  1. scale down controller
  2. update operator config to move to systemd mode
  3. upgrade operator
  4. scaleup opeator deployment

@ykulazhenkov ykulazhenkov force-pushed the switchdev-refactoring-design branch from 0c6163a to 0d22d52 Compare January 8, 2024 08:00
Copy link

github-actions bot commented Jan 8, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
@ykulazhenkov ykulazhenkov force-pushed the switchdev-refactoring-design branch from 0d22d52 to 6508c69 Compare January 10, 2024 13:57
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@ykulazhenkov
Copy link
Collaborator Author

while I was coding POC I discovered that we will need a minor change in the API - add vdpaType field to the interface status. I updated the design doc to reflect that this change is needed. @e0ne @adrianchiris please, re-review the PR

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@e0ne e0ne dismissed their stale review January 16, 2024 08:46

will re-approve the latest version

Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ykulazhenkov
Copy link
Collaborator Author

@zeeke @SchSeba can we merge this one?

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeeke zeeke merged commit 6e8a861 into k8snetworkplumbingwg:master Jan 16, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants