Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restoreSize field not being set for the snapshots created for VolumeGroupSnapshot #1271

Open
yati1998 opened this issue Feb 13, 2025 · 20 comments
Assignees

Comments

@yati1998
Copy link
Contributor

The VolumeSnapshot and VolumeSnapshotContent that get created for VolumeGroupSnapshot do not have the "restoreSize" field in the status section

Note: manual-pvc-snapshot created manually.
snapshot-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37 created as a result of a VolumeGroupSnapshot


$ oc get volumesnapshot
NAME                                                                        READYTOUSE   SOURCEPVC    SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                               SNAPSHOTCONTENT                                                                CREATIONTIME   AGE
manual-pvc-snapshot                                                         true         cephfs-pvc                           2Gi           ocs-storagecluster-cephfsplugin-snapclass   snapcontent-fe7bb6bb-745b-46c8-b9eb-fcba113b4e93                               20s            20s
snapshot-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37   true         cephfs-pvc                                                                                     snapcontent-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37   2m43s          2m48s


$ oc get volumesnapshot -o yaml
apiVersion: v1
items:
- apiVersion: snapshot.storage.k8s.io/v1
  kind: VolumeSnapshot
  metadata:
    annotations:
      snapshot.storage.kubernetes.io/pvc-access-modes: ReadWriteOnce
      snapshot.storage.kubernetes.io/pvc-storage-class: ocs-storagecluster-cephfs
      snapshot.storage.kubernetes.io/pvc-volume-mode: Filesystem
    creationTimestamp: "2025-02-10T11:53:02Z"
    finalizers:
    - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
    - snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
    generation: 1
    name: manual-pvc-snapshot
    namespace: test-cephfs
    resourceVersion: "175058"
    uid: fe7bb6bb-745b-46c8-b9eb-fcba113b4e93
  spec:
    source:
      persistentVolumeClaimName: cephfs-pvc
    volumeSnapshotClassName: ocs-storagecluster-cephfsplugin-snapclass
  status:
    boundVolumeSnapshotContentName: snapcontent-fe7bb6bb-745b-46c8-b9eb-fcba113b4e93
    creationTime: "2025-02-10T11:53:02Z"
    readyToUse: true
    restoreSize: 2Gi
- apiVersion: snapshot.storage.k8s.io/v1
  kind: VolumeSnapshot
  metadata:
    creationTimestamp: "2025-02-10T11:50:34Z"
    finalizers:
    - snapshot.storage.kubernetes.io/volumesnapshot-in-group-protection
    - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
    generation: 1
    name: snapshot-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37
    namespace: test-cephfs
    ownerReferences:
    - apiVersion: groupsnapshot.storage.k8s.io/v1beta1
      kind: VolumeGroupSnapshot
      name: cephfs-groupsnapshot-1
      uid: c313dfa1-fb05-49dc-9e2c-83e2706b70bd
    resourceVersion: "173879"
    uid: bf409078-08bb-4aae-a135-e3381349ac76
  spec:
    source:
      persistentVolumeClaimName: cephfs-pvc
  status:
    boundVolumeSnapshotContentName: snapcontent-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37
    creationTime: "2025-02-10T11:50:39Z"
    readyToUse: true
    volumeGroupSnapshotName: cephfs-groupsnapshot-1
kind: List
metadata:
  resourceVersion: ""
$ oc get volumesnapshotcontent
NAME                                                                           READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                                  VOLUMESNAPSHOTCLASS                         VOLUMESNAPSHOT                                                              VOLUMESNAPSHOTNAMESPACE   AGE
snapcontent-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37   true                       Delete           openshift-storage.cephfs.csi.ceph.com                                               snapshot-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37   test-cephfs               6m34s
snapcontent-fe7bb6bb-745b-46c8-b9eb-fcba113b4e93                               true         2147483648    Delete           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   manual-pvc-snapshot                                                         test-cephfs               4m6s

Steps to Reproduce:

  1. Create a PVC

  2. Create VolumeGroupSnapshot for the above PVC/s

  3. Check the VolumeSnapshot and VolumeSnapshotContent resources

@xing-yang
Copy link
Collaborator

Can you check "oc get volumesnapshotcontent -o yaml snapcontent-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37"?

@yati1998
Copy link
Contributor Author

apiVersion: v1
items:
- apiVersion: snapshot.storage.k8s.io/v1
  kind: VolumeSnapshotContent
  metadata:
    annotations:
      groupsnapshot.storage.k8s.io/volumeGroupSnapshotHandle: 0001-0011-openshift-storage-0000000000000001-87247f25-b0c0-4e31-8026-443581940b3d
      snapshot.storage.kubernetes.io/deletion-secret-name: rook-csi-cephfs-provisioner
      snapshot.storage.kubernetes.io/deletion-secret-namespace: openshift-storage
    creationTimestamp: "2025-02-10T11:50:34Z"
    finalizers:
    - snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
    generation: 2
    name: snapcontent-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37
    resourceVersion: "173876"
    uid: eb70cba6-a2fb-44d7-8301-ebdf26017c94
  spec:
    deletionPolicy: Delete
    driver: openshift-storage.cephfs.csi.ceph.com
    source:
      volumeHandle: 0001-0011-openshift-storage-0000000000000001-b4409473-012e-4388-9857-c7a13e23c23b
    sourceVolumeMode: Filesystem
    volumeSnapshotRef:
      kind: VolumeSnapshot
      name: snapshot-f2439c5900140e7bf4f5eda7d652a589cf6ae3f8893d057355dca060fc59fe37
      namespace: test-cephfs
      uid: bf409078-08bb-4aae-a135-e3381349ac76
  status:
    creationTime: 1739188239709563750
    readyToUse: true
    snapshotHandle: 0001-0011-openshift-storage-0000000000000001-5d35b719-b265-49fd-a02b-1ea8fe34684c
    volumeGroupSnapshotHandle: 0001-0011-openshift-storage-0000000000000001-87247f25-b0c0-4e31-8026-443581940b3d

@xing-yang
Copy link
Collaborator

@leonardoce can you take a look of this?

@yati1998
Copy link
Contributor Author

Here are few other things that I noticed while testing,

there is some mismatch between the releases and crds

I am using latest v1beta crds and trying to create volumegorupsnapshot with v8.0.2 controller and sidecar
but the snapshotter is looking for alpha1 snapshotclass

W0214 05:12:34.388567       1 reflector.go:547] github.com/kubernetes-csi/external-snapshotter/client/v8/informers/externalversions/factory.go:142: failed to list *v1alpha1.VolumeGroupSnapshotContent: the server could not find the requested resource (get volumegroupsnapshotcontents.groupsnapshot.storage.k8s.io)
E0214 05:12:34.388682       1 reflector.go:150] github.com/kubernetes-csi/external-snapshotter/client/v8/informers/externalversions/factory.go:142: Failed to watch *v1alpha1.VolumeGroupSnapshotContent: failed to list *v1alpha1.VolumeGroupSnapshotContent: the server could not find the requested resource (get volumegroupsnapshotcontents.groupsnapshot.storage.k8s.io)
[10:43](https://kubernetes.slack.com/archives/D059A0J2K88/p1739509982433639)

in addition to this, I see the featuregates PR is not included in v8.0.2, it still uses the enable-volume-group-snapshot flag

@leonardoce can you please help me to verify this

@leonardoce
Copy link
Contributor

/assign

@leonardoce
Copy link
Contributor

leonardoce commented Feb 25, 2025

Hi @yati1998.
I tested the latest main of the external-snapshot controller, of the sidecar, and of the csi-driver-hostpath.
Everything looks fine in my tests.

This is an example:

apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshot
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"groupsnapshot.storage.k8s.io/v1beta1","kind":"VolumeGroupSnapshot","metadata":{"annotations":{},"name":"new-groupsnapshot-demo","namespace":"default"},"spec":{"source":{"selector":{"matchLabels":{"cnpg.io/instanceName":"cluster-example-1"}}},"volumeGroupSnapshotClassName":"csi-hostpath-groupsnapclass"}}
  creationTimestamp: "2025-02-25T08:01:17Z"
  finalizers:
  - groupsnapshot.storage.kubernetes.io/volumegroupsnapshot-bound-protection
  generation: 1
  name: new-groupsnapshot-demo
  namespace: default
  resourceVersion: "4863"
  uid: 33a7dcdc-9e6c-4772-971d-bcb44ec06ad8
spec:
  source:
    selector:
      matchLabels:
        cnpg.io/instanceName: cluster-example-1
  volumeGroupSnapshotClassName: csi-hostpath-groupsnapclass
status:
  boundVolumeGroupSnapshotContentName: groupsnapcontent-33a7dcdc-9e6c-4772-971d-bcb44ec06ad8
  creationTime: "2025-02-25T08:01:17Z"
  readyToUse: true
apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshotContent
metadata:
  creationTimestamp: "2025-02-25T08:01:17Z"
  finalizers:
  - groupsnapshot.storage.kubernetes.io/volumegroupsnapshotcontent-bound-protection
  generation: 1
  name: groupsnapcontent-33a7dcdc-9e6c-4772-971d-bcb44ec06ad8
  resourceVersion: "4829"
  uid: f82e85dc-6d2c-44a2-9cdd-0cc82616c36c
spec:
  deletionPolicy: Delete
  driver: hostpath.csi.k8s.io
  source:
    volumeHandles:
    - 0fbf45f6-f34d-11ef-be86-febd037d90a9
    - 0fbf538f-f34d-11ef-be86-febd037d90a9
  volumeGroupSnapshotClassName: csi-hostpath-groupsnapclass
  volumeGroupSnapshotRef:
    apiVersion: groupsnapshot.storage.k8s.io/v1beta1
    kind: VolumeGroupSnapshot
    name: new-groupsnapshot-demo
    namespace: default
    resourceVersion: "4814"
    uid: 33a7dcdc-9e6c-4772-971d-bcb44ec06ad8
status:
  creationTime: "2025-02-25T08:01:17Z"
  readyToUse: true
  volumeGroupSnapshotHandle: b0ff8842-f34e-11ef-be86-febd037d90a9
  volumeSnapshotHandlePairList:
  - snapshotHandle: b0ff885d-f34e-11ef-be86-febd037d90a9
    volumeHandle: 0fbf45f6-f34d-11ef-be86-febd037d90a9
  - snapshotHandle: b1490bba-f34e-11ef-be86-febd037d90a9
    volumeHandle: 0fbf538f-f34d-11ef-be86-febd037d90a9

As you can see, the generated VolumeSnapshots have the restoreSize field correctly set.
This is one of them:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  creationTimestamp: "2025-02-25T08:01:18Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshot-in-group-protection
  - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  generation: 1
  name: snapshot-80c588a540d3152c8ab374e3badbd43ee04c6f62e5ecf55c13bac761520599c3
  namespace: default
  ownerReferences:
  - apiVersion: groupsnapshot.storage.k8s.io/v1beta1
    kind: VolumeGroupSnapshot
    name: new-groupsnapshot-demo
    uid: 33a7dcdc-9e6c-4772-971d-bcb44ec06ad8
  resourceVersion: "4853"
  uid: 90369e82-427d-4d8b-a0c6-1fd402bf5b49
spec:
  source:
    persistentVolumeClaimName: cluster-example-1
status:
  boundVolumeSnapshotContentName: snapcontent-80c588a540d3152c8ab374e3badbd43ee04c6f62e5ecf55c13bac761520599c3
  creationTime: "2025-02-25T08:01:17Z"
  readyToUse: true
  restoreSize: 1Gi
  volumeGroupSnapshotName: new-groupsnapshot-demo
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  annotations:
    groupsnapshot.storage.k8s.io/volumeGroupSnapshotHandle: b0ff8842-f34e-11ef-be86-febd037d90a9
  creationTimestamp: "2025-02-25T08:01:18Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
  generation: 2
  name: snapcontent-80c588a540d3152c8ab374e3badbd43ee04c6f62e5ecf55c13bac761520599c3
  resourceVersion: "4844"
  uid: d3b16508-54ba-4756-9021-a6374a600f29
spec:
  deletionPolicy: Delete
  driver: hostpath.csi.k8s.io
  source:
    volumeHandle: 0fbf45f6-f34d-11ef-be86-febd037d90a9
  sourceVolumeMode: Filesystem
  volumeSnapshotRef:
    kind: VolumeSnapshot
    name: snapshot-80c588a540d3152c8ab374e3badbd43ee04c6f62e5ecf55c13bac761520599c3
    namespace: default
    uid: 90369e82-427d-4d8b-a0c6-1fd402bf5b49
status:
  creationTime: 1740470477430996381
  readyToUse: true
  restoreSize: 1073741824
  snapshotHandle: b0ff885d-f34e-11ef-be86-febd037d90a9
  volumeGroupSnapshotHandle: b0ff8842-f34e-11ef-be86-febd037d90a9

@yati1998
Copy link
Contributor Author

@leonardoce can you please try the same with v8.2.0? I tried with v8.2.0 and csi driver and it had missing restore size, if this even fails, we can check, maybe it is csi driver

@Madhu-1
Copy link
Contributor

Madhu-1 commented Feb 25, 2025

@leonardoce with host path things might be working because of optional RPC implementation of LIST operation, as its optional not all csi driver have implemented it.

@leonardoce
Copy link
Contributor

leonardoce commented Feb 25, 2025

@leonardoce can you please try the same with v8.2.0? I tried with v8.2.0 and csi driver and it had missing restore size, if this even fails, we can check, maybe it is csi driver

I confirm everything looks good with 8.2.0 and csi-driver-host-path.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  creationTimestamp: "2025-02-25T13:11:11Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshot-in-group-protection
  - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  generation: 1
  name: snapshot-066a9ce7dc0e49b12da217db669e267bb73f7a5f54b3082c7fde7c33f36267b4
  namespace: default
  ownerReferences:
  - apiVersion: groupsnapshot.storage.k8s.io/v1beta1
    kind: VolumeGroupSnapshot
    name: new-groupsnapshot-demo
    uid: 0b271bb7-a4db-44da-bd97-58e52e701d9a
  resourceVersion: "3309"
  uid: 4e5e429e-830b-410a-b2b9-9204da6b9e7d
spec:
  source:
    persistentVolumeClaimName: cluster-example-1
status:
  boundVolumeSnapshotContentName: snapcontent-066a9ce7dc0e49b12da217db669e267bb73f7a5f54b3082c7fde7c33f36267b4
  creationTime: "2025-02-25T13:11:10Z"
  readyToUse: true
  restoreSize: 1Gi
  volumeGroupSnapshotName: new-groupsnapshot-demo
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  annotations:
    groupsnapshot.storage.k8s.io/volumeGroupSnapshotHandle: fb577341-f379-11ef-a02a-8edf8cccdddf
  creationTimestamp: "2025-02-25T13:11:11Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
  generation: 2
  name: snapcontent-066a9ce7dc0e49b12da217db669e267bb73f7a5f54b3082c7fde7c33f36267b4
  resourceVersion: "3299"
  uid: 4e0aae1a-5231-4c74-9725-18ba35718555
spec:
  deletionPolicy: Delete
  driver: hostpath.csi.k8s.io
  source:
    volumeHandle: b2083446-f379-11ef-a02a-8edf8cccdddf
  sourceVolumeMode: Filesystem
  volumeSnapshotRef:
    kind: VolumeSnapshot
    name: snapshot-066a9ce7dc0e49b12da217db669e267bb73f7a5f54b3082c7fde7c33f36267b4
    namespace: default
    uid: 4e5e429e-830b-410a-b2b9-9204da6b9e7d
status:
  creationTime: 1740489070517946266
  readyToUse: true
  restoreSize: 1073741824
  snapshotHandle: fb577350-f379-11ef-a02a-8edf8cccdddf
  volumeGroupSnapshotHandle: fb577341-f379-11ef-a02a-8edf8cccdddf

I directly used the image you can find in the registry for this test (registry.k8s.io/sig-storage/snapshot-controller:v8.2.0).

@leonardoce
Copy link
Contributor

@leonardoce with host path things might be working because of optional RPC implementation of LIST operation, as its optional not all csi driver have implemented

This is far more difficult to debug for me as I don't have access to your implementation.

From what I can see, we get the snapshots size from the ListSnapshots and from the CreateSnapshot GRPC calls.
Let's exclude from the discussion the first one, for the reasons you explained.

The CreateSnapshot entrypoint is used for individual volume snapshots and I guess in your implementation this will return the size, and the codebase is using that field.

The CreateGroupSnapshot GRPC entrypoint have that information too, but it looks like the current codebase doesn't use it.

I guess this is what you are experiencing. Can you confirm @Madhu-1 ?

@Madhu-1
Copy link
Contributor

Madhu-1 commented Feb 25, 2025

@leonardoce with host path things might be working because of optional RPC implementation of LIST operation, as its optional not all csi driver have implemented

This is far more difficult to debug for me as I don't have access to your implementation.

From what I can see, we get the snapshots size from the ListSnapshots and from the CreateSnapshot GRPC calls. Let's exclude from the discussion the first one, for the reasons you explained.

The CreateSnapshot entrypoint is used for individual volume snapshots and I guess in your implementation this will return the size, and the codebase is using that field.

The CreateGroupSnapshot GRPC entrypoint have that information too, but it looks like the current codebase doesn't use it.

I guess this is what you are experiencing. Can you confirm @Madhu-1 ?

Yes thats correct, after the rework on this PR #1171 we got into this one.

@Madhu-1
Copy link
Contributor

Madhu-1 commented Feb 25, 2025

@leonardoce with host path things might be working because of optional RPC implementation of LIST operation, as its optional not all csi driver have implemented

This is far more difficult to debug for me as I don't have access to your implementation.
From what I can see, we get the snapshots size from the ListSnapshots and from the CreateSnapshot GRPC calls. Let's exclude from the discussion the first one, for the reasons you explained.

Yes we have problem with this expectation, its not always true that all csi drivers will implement ListSnapshots as its optional.

The CreateSnapshot entrypoint is used for individual volume snapshots and I guess in your implementation this will return the size, and the codebase is using that field.
The CreateGroupSnapshot GRPC entrypoint have that information too, but it looks like the current codebase doesn't use it.
I guess this is what you are experiencing. Can you confirm @Madhu-1 ?

Yes thats correct, after the rework on this PR #1171 we got into this one.

@yati1998
Copy link
Contributor Author

Seems the fix should be in snapshot-controller in this case, while individual snapshot is getting created ??

@leonardoce
Copy link
Contributor

Seems the fix should be in snapshot-controller in this case, while individual snapshot is getting created ??

We don't have access to the CSI driver there and we don't have that info.

@xing-yang
Copy link
Collaborator

Discussed with @leonardoce about this. We can call GetVolumeGroupSnapshot in csi-snapshotter to update the restoreSize of VolumeSnapshotContent. That should return a list of "message Snapshot" which contains size_bytes (the restore size).

CSI driver supporting CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability is required to implement GetVolumeGroupSnapshot. Basically we have the same capability for create/delete/get volume group snapshot.

@Madhu-1
Copy link
Contributor

Madhu-1 commented Feb 26, 2025

Discussed with @leonardoce about this. We can call GetVolumeGroupSnapshot in csi-snapshotter to update the restoreSize of VolumeSnapshotContent. That should return a list of "message Snapshot" which contains size_bytes (the restore size).

CSI driver supporting CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability is required to implement GetVolumeGroupSnapshot. Basically we have the same capability for create/delete/get volume group snapshot.

This optional RPC MAY be called by the CO to fetch current information about a volume group snapshot.

@xing-yang do we need remove optional in the statement at https://github.com/container-storage-interface/spec/blob/master/spec.md#getvolumegroupsnapshot because its not optional or is it correct one?

@leonardoce
Copy link
Contributor

I found another problem in that code: we're using the ListSnapshots call for volume group snapshot members and we won't have a reference to a VolumeSnapshotClass.
Because of that, we are not passing the credentials to the driver.

I'm fixing that too by recovering the reference to the VolumeGroupSnapshotClass and passing the correct credentials for the GetVolumeGroupSnapshot call.

cc: @Madhu-1

@xing-yang
Copy link
Collaborator

@Madhu-1 Actually it is still "optional" because a CSI driver is not required to implement it. We also have the line below that stated a driver must implement it if the capability is supported.

A Controller Plugin MUST implement this GetVolumeGroupSnapshot RPC call if it has CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability.

@Madhu-1
Copy link
Contributor

Madhu-1 commented Feb 28, 2025

@Madhu-1 Actually it is still "optional" because a CSI driver is not required to implement it. We also have the line below that stated a driver must implement it if the capability is supported.

A Controller Plugin MUST implement this GetVolumeGroupSnapshot RPC call if it has CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability.

@xing-yang For me it created a confusion where its optional and also must be implemented if CSI Driver has CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability, because we have only one capability i.e CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT for group snapshot

@xing-yang
Copy link
Collaborator

@Madhu-1 I submitted a PR to remove "optional": container-storage-interface/spec#580

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants