Skip to content

ZTS: Fix replacement/resilver_restart_001 on FreeBSD #17279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2025

Conversation

mcmilk
Copy link
Contributor

@mcmilk mcmilk commented Apr 27, 2025

Motivation and Context

Details on the issue can be seen here #16822

Increasing the data files from 16M to 32M fixes the replacement/resilver_restart_001 test.

Closes: #16822

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin
Copy link
Member

amotin commented Apr 28, 2025

I wonder if it is actually the same problem as I see in #17269 -- "online" (re-)starts resilver asynchronously.

@mcmilk
Copy link
Contributor Author

mcmilk commented Apr 29, 2025

I wonder if it is actually the same problem as I see in #17269 -- "online" (re-)starts resilver asynchronously.

Yes, they seem to be related. But I didn't go into the details for this fix. Increasing the size just resolved the resilver_restart_001 test ... But of cause, this is currently more a workaround.

@amotin
Copy link
Member

amotin commented Apr 29, 2025

@mcmilk I agree with @snajpa that it makes no logical sense to me. I don't have a particular problem with increasing the write size to 32MB, but I worry that the problem might re-appear later when we change something in CI or OS. I think some sleep 1 after online same as I added could be more obvious and predictable.

Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20.
So the test, which expects two 2 resilver starts will see them.

Logfile of the seen failures before this fix:
log: NOTE: expected 2 resilver start(s) after offline/online, found 1
log: expected 2 resilver start(s) after offline/online, found 1

The test time decreases also from around 00:42 to 00:24 seconds.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes: openzfs#16822
@mcmilk mcmilk force-pushed the fix-resilver_restart_001 branch from 69139c8 to 4e2a990 Compare May 1, 2025 09:28
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand what and how does this fix, but if is does -- whatever, should not hurt.

@amotin amotin added the Status: Accepted Ready to integrate (reviewed, tested) label May 2, 2025
@amotin amotin merged commit 3b18877 into openzfs:master May 2, 2025
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ZTS: replacement/resilver_restart_001 fails on FreeBSD 14+
4 participants