Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFBUGS-342: object: also use system certs for validating RGW cert #772

Merged

Conversation

BlaineEXE
Copy link

@BlaineEXE BlaineEXE commented Nov 11, 2024

Backport of rook#14911 (rook#14835) to release-4.16

When generating the HTTP client used for RGW admin ops, use both system certs as well as the user-given cert.

As a real world example, admins may use ACME to rotate Letsencrypt certs every 2 months. For an external CephObjectStore, the cert used by Rook and RGW may not be rotated at the same time. This can cause the Rook operator to fail CephObjectStore reconciliation until both certs agree.

When Rook also relies on system certs in the container, Rook's reconciliation will not have reconciliation failures because Letsencrypt's well-known and trusted root certificates can be loaded from the system to validate the RGW's newly-rotated cert.

Signed-off-by: Blaine Gardner blaine.gardner@ibm.com
(cherry picked from commit 7bb72a0)
(cherry picked from commit 92267b5)

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@agarwal-mudit agarwal-mudit changed the title object: also use system certs for validating RGW cert DFBUGS-342: object: also use system certs for validating RGW cert Nov 19, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 19, 2024

@BlaineEXE: This pull request references [Jira Issue DFBUGS-342](https://issues.redhat.com//browse/DFBUGS-342), which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (odf-4.16.4) matches configured target version for branch (odf-4.16.4)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @nehaberry

In response to this:

Backport of rook#14911 (rook#14835) to release-4.16

When generating the HTTP client used for RGW admin ops, use both system certs as well as the user-given cert.

As a real world example, admins may use ACME to rotate Letsencrypt certs every 2 months. For an external CephObjectStore, the cert used by Rook and RGW may not be rotated at the same time. This can cause the Rook operator to fail CephObjectStore reconciliation until both certs agree.

When Rook also relies on system certs in the container, Rook's reconciliation will not have reconciliation failures because Letsencrypt's well-known and trusted root certificates can be loaded from the system to validate the RGW's newly-rotated cert.

Signed-off-by: Blaine Gardner blaine.gardner@ibm.com
(cherry picked from commit 7bb72a0)
(cherry picked from commit 92267b5)

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

openshift-ci bot commented Nov 19, 2024

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: nehaberry.

Note that only red-hat-storage members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

@BlaineEXE: This pull request references [Jira Issue DFBUGS-342](https://issues.redhat.com//browse/DFBUGS-342), which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (odf-4.16.4) matches configured target version for branch (odf-4.16.4)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @nehaberry

In response to this:

Backport of rook#14911 (rook#14835) to release-4.16

When generating the HTTP client used for RGW admin ops, use both system certs as well as the user-given cert.

As a real world example, admins may use ACME to rotate Letsencrypt certs every 2 months. For an external CephObjectStore, the cert used by Rook and RGW may not be rotated at the same time. This can cause the Rook operator to fail CephObjectStore reconciliation until both certs agree.

When Rook also relies on system certs in the container, Rook's reconciliation will not have reconciliation failures because Letsencrypt's well-known and trusted root certificates can be loaded from the system to validate the RGW's newly-rotated cert.

Signed-off-by: Blaine Gardner blaine.gardner@ibm.com
(cherry picked from commit 7bb72a0)
(cherry picked from commit 92267b5)

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@agarwal-mudit
Copy link
Member

/retest

@agarwal-mudit agarwal-mudit added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. labels Nov 19, 2024
Copy link

openshift-ci bot commented Nov 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: BlaineEXE

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@agarwal-mudit
Copy link
Member

/retest

@BlaineEXE BlaineEXE force-pushed the bp-rgw-sys-certs-to-odf-4.16 branch from 545df84 to 9b4dc8f Compare November 20, 2024 16:37
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 20, 2024
Copy link

openshift-ci bot commented Nov 20, 2024

New changes are detected. LGTM label has been removed.

When generating the HTTP client used for RGW admin ops, use both system
certs as well as the user-given cert.

As a real world example, admins may use ACME to rotate Letsencrypt certs
every 2 months. For an external CephObjectStore, the cert used by Rook
and RGW may not be rotated at the same time. This can cause the Rook
operator to fail CephObjectStore reconciliation until both certs agree.

When Rook also relies on system certs in the container, Rook's
reconciliation will not have reconciliation failures because
Letsencrypt's well-known and trusted root certificates can be loaded
from the system to validate the RGW's newly-rotated cert.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
(cherry picked from commit 7bb72a0)

# Conflicts:
#	Documentation/CRDs/Cluster/external-cluster/provider-export.md
(cherry picked from commit 92267b5)
@BlaineEXE BlaineEXE force-pushed the bp-rgw-sys-certs-to-odf-4.16 branch from 9b4dc8f to 18ed081 Compare November 20, 2024 16:40
@agarwal-mudit agarwal-mudit added the lgtm Indicates that a PR is ready to be merged. label Nov 20, 2024
Backport minimal CI fixes from 69eeaab

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 20, 2024
Copy link

openshift-ci bot commented Nov 20, 2024

New changes are detected. LGTM label has been removed.

@BlaineEXE
Copy link
Author

BlaineEXE commented Nov 20, 2024

A unit test unrelated to this PR failing also. It appears to be a consistent failure, not a flake.

=== RUN   TestPostReconcileUpdateOSDProperties/test_resize_Osd_Crush_Weight
2024-11-20 18:15:13.215428 I | op-osd: ExecuteCommandWithOutput: ceph [osd df --connect-timeout=15 --cluster=ns --conf=/var/lib/rook/ns/ns.config --name= --keyring=/var/lib/rook/ns/.keyring --format json]
2024-11-20 18:15:13.215502 I | cephclient: updating osd.3 crush weight to "9.166024" for cluster in namespace "ns"
2024-11-20 18:15:13.215519 I | op-osd: ExecuteCommandWithOutput: ceph [osd crush reweight osd.3 9.166024 --connect-timeout=15 --cluster=ns --conf=/var/lib/rook/ns/ns.config --name= --keyring=/var/lib/rook/ns/.keyring --format json]
2024-11-20 18:15:13.215532 I | cephclient: updating osd.4 crush weight to "9.305722" for cluster in namespace "ns"
2024-11-20 18:15:13.215541 I | op-osd: ExecuteCommandWithOutput: ceph [osd crush reweight osd.4 9.305722 --connect-timeout=15 --cluster=ns --conf=/var/lib/rook/ns/ns.config --name= --keyring=/var/lib/rook/ns/.keyring --format json]
    osd_test.go:455: 
        	Error Trace:	/home/runner/work/rook/rook/pkg/operator/ceph/cluster/osd/osd_test.go:455
        	Error:      	Not equal: 
        	            	expected: []string{"osd.3", "osd.4"}
        	            	actual  : []string{"osd.3", "osd.4", "osd.3", "osd.4"}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,4 @@
        	            	-([]string) (len=2) {
        	            	+([]string) (len=4) {
        	            	+ (string) (len=5) "osd.3",
        	            	+ (string) (len=5) "osd.4",
        	            	  (string) (len=5) "osd.3",
        	Test:       	TestPostReconcileUpdateOSDProperties/test_resize_Osd_Crush_Weight
    osd_test.go:456: 
        	Error Trace:	/home/runner/work/rook/rook/pkg/operator/ceph/cluster/osd/osd_test.go:456
        	Error:      	Not equal: 
        	            	expected: []string{"9.166024", "9.305722"}
        	            	actual  : []string{"9.166024", "9.305722", "9.166024", "9.305722"}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,4 @@
        	            	-([]string) (len=2) {
        	            	+([]string) (len=4) {
        	            	+ (string) (len=8) "9.166024",
        	            	+ (string) (len=8) "9.305722",
        	            	  (string) (len=8) "9.166024",
        	Test:       	TestPostReconcileUpdateOSDProperties/test_resize_Osd_Crush_Weight
--- FAIL: TestPostReconcileUpdateOSDProperties (0.00s)
    --- PASS: TestPostReconcileUpdateOSDProperties/test_device_class_change (0.00s)
    --- FAIL: TestPostReconcileUpdateOSDProperties/test_resize_Osd_Crush_Weight (0.00s)

@parth-gr it looks like you modified these recently. Do you know what the issue might be here? I'm wondering if a code change was committed without unit tests for downstream -- or vice versa?

@BlaineEXE
Copy link
Author

BlaineEXE commented Nov 20, 2024

We are hitting this issue on object/smoke tests. This is the very last step of those tests, and the ones prior passed, so I think we can ignore that issue by manual inspection. Here's the report on that: rook#14947

@BlaineEXE BlaineEXE merged commit 9052714 into red-hat-storage:release-4.16 Nov 20, 2024
40 of 49 checks passed
@BlaineEXE BlaineEXE deleted the bp-rgw-sys-certs-to-odf-4.16 branch November 20, 2024 19:17
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 20, 2024

@BlaineEXE: [Jira Issue DFBUGS-342](https://issues.redhat.com//browse/DFBUGS-342): All pull requests linked via external trackers have merged:

[Jira Issue DFBUGS-342](https://issues.redhat.com//browse/DFBUGS-342) has been moved to the MODIFIED state.

In response to this:

Backport of rook#14911 (rook#14835) to release-4.16

When generating the HTTP client used for RGW admin ops, use both system certs as well as the user-given cert.

As a real world example, admins may use ACME to rotate Letsencrypt certs every 2 months. For an external CephObjectStore, the cert used by Rook and RGW may not be rotated at the same time. This can cause the Rook operator to fail CephObjectStore reconciliation until both certs agree.

When Rook also relies on system certs in the container, Rook's reconciliation will not have reconciliation failures because Letsencrypt's well-known and trusted root certificates can be loaded from the system to validate the RGW's newly-rotated cert.

Signed-off-by: Blaine Gardner blaine.gardner@ibm.com
(cherry picked from commit 7bb72a0)
(cherry picked from commit 92267b5)

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug jira/valid-reference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants