Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File-based disk-only VM snapshot with KVM as hypervisor #10632

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JoaoJandre
Copy link
Contributor

Description

This PR implements the spec available at #9524. For more information regarding it, please read the spec.

Furthermore, the following changes that are not contemplated in the spec were added:

  1. The snapshot.merge.timeout agent property was added. It is only considered if libvirt.events.enabled is true;
  2. A new snapshot merge process (which affects normal volume snapshots and this feature) was created. When libvirt.events.enabled is true, ACS will register to gather events from Libvirt and will collect information on the process, providing a progress report in the logs. If the configuration is false, the old process is used;
  3. Volumes attached to VMs with file-based disk-only VM snapshots in KVM are able to be resized.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Basic Tests

I created a test VM to carry out the tests below. Additionally, after performing the relevant operations, the VM's XML and the storage were checked to observe if the snapshots existed.

Snapshot Creation

The tests below were also repeated with the VM stopped.

N Test Result
1 Take a snapshot of VM 1 without specifying quiesceVM Snapshot created
2 Take a snapshot of VM 2 specifying quiesceVM Snapshot created

Snapshot Reversion

N Test Result
1 Revert VM in Running state to any snapshot Error thrown
2 Revert VM in Stopped state to snapshot 1 and start it VM reverted and started successfully

Snapshot Removal

N Test Result
1 Create a new snapshot 3 after the second reversion test and delete snapshot 1 I verified that the snapshot was no longer listed and had the correct database metadata, the file still existed because more than one delta depended on it
2 Delete snapshot 2 Snapshot deleted; snapshot 1 was merged with snapshot 3 since it only had the latter as a dependency
3 Delete snapshot 3 (current) Snapshot removed, merged with the VM's volume
4 Create 3 snapshots and remove the first one Snapshot removed, merged with the second snapshot
5 Create two snapshots, revert to the first, and delete the second Snapshot deleted

Advanced Tests

Deletion Test

All tests were carried out with the VM stopped.

  1. I created 3 snapshots: s1, s2, and s3.
  2. I reverted the VM to snapshot s2.
  3. I created snapshot s4.
  4. I removed snapshot s2.

The snapshot was marked as hidden and was not removed from storage.

  1. I removed snapshot s3.

Snapshot s3 was removed normally. Snapshot s2 was merged with snapshot s4.

  1. I created snapshot s5.
  2. I reverted to snapshot s4.
  3. I removed snapshot s4.

Snapshot s4 was marked as hidden and was not removed from storage.

  1. I removed snapshot s5.
    Snapshot s5 was removed normally. Snapshot s4 was merged with the delta of the VM's volume.
  2. I removed the last remaining snapshot (s1). It was removed normally.

Reversion Test

  1. I created two snapshots: s1 and s2.
  2. I reverted to snapshot s1.
  3. I removed snapshot s1.

Snapshot s1 was marked as hidden and was not removed from storage.

  1. I reverted to snapshot s2. Snapshot s1 was merged with the base volume.

Concurrent Test

I created 4 VMs and took a VM snapshot of each. Then, I instructed to remove them all at the same time. All snapshots were removed simultaneously and successfully.

Test with Multiple Volumes

I created a VM with one datadisk and attached 8 more datadisks (10 volumes in total), took two VM snapshots, and then instructed to remove one at a time. The snapshots were removed successfully.

Tests Changing the snapshot.merge.timeout Config

  1. I changed the config to 1 and restarted the host;
  2. I created a VM, took a VM snapshot, accessed it, and wrote 4GB of data to it;
  3. I tried to remove the snapshot, an error occurred, and looking at the logs, it was possible to observe that it timed out;
  4. I manually aborted the blockcommit process;
  5. I changed the config to 0 and restarted the host;
  6. I tried to remove the snapshot, and it was performed correctly;

Tests Related to Volume Resize with Disk-Only VM Snapshots on KVM

Test Result Expected?
Create a VM, take a snapshot, resize the volume Resize performed successfully, both in metadata and when checked with qemu-img info Y
Stop the VM and revert the snapshot Revert performed successfully, volume size returned to original, both in metadata and qemu-img info Y
Remove the snapshot with the VM stopped The delta of the volume was correctly merged with the snapshot's, and the final size was that of the volume Y
Start the VM, take a new snapshot, resize the volume, and remove the snapshot Deltas were correctly merged, and the final size was that of the volume Y

The last two tests were repeated on a VM with several snapshots, so that a merge between snapshots was performed. The result was the same.

Tests Related to Events:

  1. Create VM, take disk-only VM snapshot, resize the root volume by 1GB more, stop the VM, revert the snapshot. It was observed through the cloud.usage_event table that the resize event was correctly triggered, and it was also observed via GUI that the account's resource limit was updated.
  2. Repeat the test above with a VM with two volumes, with only one resized. The test had the same result, and only one resize event was triggered, for the volume that had been resized.

@JoaoJandre
Copy link
Contributor Author

@blueorangutan package

Copy link

codecov bot commented Mar 28, 2025

Codecov Report

Attention: Patch coverage is 10.84711% with 863 lines in your changes missing coverage. Please review.

Project coverage is 16.29%. Comparing base (8af021c) to head (2027f25).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...napshot/KvmFileBasedStorageVmSnapshotStrategy.java 0.24% 403 Missing and 1 partial ⚠️
...LibvirtCreateDiskOnlyVMSnapshotCommandWrapper.java 1.04% 95 Missing ⚠️
.../LibvirtMergeDiskOnlyVMSnapshotCommandWrapper.java 1.38% 71 Missing ⚠️
...ervisor/kvm/resource/LibvirtComputingResource.java 55.83% 50 Missing and 3 partials ⚠️
...LibvirtRevertDiskOnlyVMSnapshotCommandWrapper.java 1.96% 50 Missing ⚠️
...d/hypervisor/kvm/resource/BlockCommitListener.java 28.12% 22 Missing and 1 partial ⚠️
...m/cloud/agent/api/storage/SnapshotMergeTreeTO.java 0.00% 21 Missing ⚠️
...nt/api/storage/MergeDiskOnlyVmSnapshotCommand.java 0.00% 18 Missing ⚠️
...java/org/apache/cloudstack/utils/qemu/QemuImg.java 0.00% 18 Missing ⚠️
...LibvirtDeleteDiskOnlyVMSnapshotCommandWrapper.java 5.88% 16 Missing ⚠️
... and 14 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10632      +/-   ##
============================================
- Coverage     16.30%   16.29%   -0.02%     
- Complexity    13448    13463      +15     
============================================
  Files          5674     5687      +13     
  Lines        499213   500125     +912     
  Branches      60368    60458      +90     
============================================
+ Hits          81395    81486      +91     
- Misses       408755   409567     +812     
- Partials       9063     9072       +9     
Flag Coverage Δ
uitests 3.99% <ø> (+<0.01%) ⬆️
unittests 17.15% <10.84%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant