Skip to content

Commit d4f86a0

Browse files
Madhu-1subhamkrai
authored andcommitted
csi: disable fencing in Rook
Disabling the RBD and CephFS fencing in Rook for now as its having bugs where Rook is blocklisting wrong IP address due to timing issues. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com> (cherry picked from commit 1d1ed5e) (cherry picked from commit 34f81ec) (cherry picked from commit fc94e11)
1 parent e5d9d76 commit d4f86a0

File tree

3 files changed

+8
-8
lines changed

3 files changed

+8
-8
lines changed

Documentation/Storage-Configuration/Block-Storage-RBD/block-storage.md

+5-2
Original file line numberDiff line numberDiff line change
@@ -196,8 +196,6 @@ The erasure coded pool must be set as the `dataPool` parameter in
196196

197197
If a node goes down where a pod is running where a RBD RWO volume is mounted, the volume cannot automatically be mounted on another node. The node must be guaranteed to be offline before the volume can be mounted on another node.
198198

199-
!!! Note
200-
These instructions are for clusters with Kubernetes version 1.26 or greater. For K8s 1.25 or older, see the [manual steps in the CSI troubleshooting guide](../../Troubleshooting/ceph-csi-common-issues.md#node-loss) to recover from the node loss.
201199

202200
### Configure CSI-Addons
203201

@@ -217,6 +215,11 @@ kubectl patch cm rook-ceph-operator-config -n<namespace> -p $'data:\n "CSI_ENABL
217215

218216
### Handling Node Loss
219217

218+
!!! warning
219+
Automated node loss handling is currently disabled, please refer to the [manual steps](../../Troubleshooting/ceph-csi-common-issues.md#node-loss) to recover from the node loss.
220+
We are actively working on a new design for this feature.
221+
For more details see the [tracking issue](https://github.com/rook/rook/issues/14832).
222+
220223
When a node is confirmed to be down, add the following taints to the node:
221224

222225
```console

Documentation/Troubleshooting/ceph-csi-common-issues.md

-3
Original file line numberDiff line numberDiff line change
@@ -413,9 +413,6 @@ Where `-m` is one of the mon endpoints and the `--key` is the key used by the CS
413413

414414
When a node is lost, you will see application pods on the node stuck in the `Terminating` state while another pod is rescheduled and is in the `ContainerCreating` state.
415415

416-
!!! important
417-
For clusters with Kubernetes version 1.26 or greater, see the [improved automation](../Storage-Configuration/Block-Storage-RBD/block-storage.md#recover-rbd-rwo-volume-in-case-of-node-loss) to recover from the node loss. If using K8s 1.25 or older, continue with these instructions.
418-
419416
### Force deleting the pod
420417

421418
To force delete the pod stuck in the `Terminating` state:

pkg/operator/ceph/cluster/watcher.go

+3-3
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,9 @@ func (c *clientCluster) onK8sNode(ctx context.Context, object runtime.Object) bo
8181
cluster := c.getCephCluster()
8282

8383
// Continue reconcile in case of failure too since we don't want to block other node reconcile
84-
if err := c.handleNodeFailure(ctx, cluster, node); err != nil {
85-
logger.Errorf("failed to handle node failure. %v", err)
86-
}
84+
// if err := c.handleNodeFailure(ctx, cluster, node, opNamespace); err != nil {
85+
// logger.Errorf("failed to handle node failure. %v", err)
86+
// }
8787

8888
// skip reconcile if node is already checked in a previous reconcile
8989
if nodesCheckedForReconcile.Has(node.Name) {

0 commit comments

Comments
 (0)