Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for raid10 #1900

Merged
merged 1 commit into from
Feb 25, 2025
Merged

Add support for raid10 #1900

merged 1 commit into from
Feb 25, 2025

Conversation

lassizci
Copy link
Contributor

Description of changes:
I would like to be able to migrate workloads away from a node gracefully in case of instance storage drive failure. Raid10 would provide redundancy and trade off disk space.

Adding support for creating raid10 in addition to raid0. This also removes the wait block for raid resync for two reasons:

  1. raid0 does not have redundancy and therefore no initial resync[1]
  2. with raid10 the resync time for 4x 1.9TB disks takes from tens of minutes to multiple hours, depending on sysctl params dev.raid.speed_limit_min and dev.raid.speed_limit_max and the speed of the disks. Initial resync for raid10 is not strictly needed[1]

filesystem creation: by default mkfs.xfs attempts to TRIM the drive. This is also something that can take tens of minutes or hours, depening on the size of drives. TRIM can be skipped, as instances are delivered with disks fully trimmed[2].

[1] https://raid.wiki.kernel.org/index.php/Initial_Array_Creation
[2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#InstanceStoreTrimSupport

Testing Done

on m6id.metal with kernel defaults:

# uname -a
Linux ip-10-24-0-65.eu-west-1.compute.internal 5.10.199-190.747.amzn2.x86_64 #1 SMP Sat Nov 4 16:55:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

#  sysctl dev.raid.speed_limit_min
dev.raid.speed_limit_min = 1000

# sysctl dev.raid.speed_limit_max
dev.raid.speed_limit_max = 200000

# mdadm --create --force --verbose /dev/md/kubernetes --level=10 --name=kubernetes --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855337472K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/kubernetes started.

# cat /proc/mdstat 
Personalities : [raid10] 
md127 : active raid10 nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
      3710674944 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      [>....................]  resync =  1.1% (41396352/3710674944) finish=304.3min speed=200910K/sec
      bitmap: 28/28 pages [112KB], 65536KB chunk

With increased resync limits:

# sysctl -w dev.raid.speed_limit_min=2146999999 ; sysctl -w dev.raid.speed_limit_max=2146999999
dev.raid.speed_limit_min = 2146999999
dev.raid.speed_limit_max = 2146999999

# cat /proc/mdstat 
Personalities : [raid10] 
md127 : active raid10 nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
      3710674944 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      [===>.................]  resync = 19.9% (740172096/3710674944) finish=20.4min speed=2418848K/sec
      bitmap: 23/28 pages [92KB], 65536KB chunk

Due to some... accidents, this replaces the old pull request: #1549. There's further discussions about the details.

@Issacwww
Copy link
Member

The original PR has enough context, #1549. The change LGTM.
My question for this would be do we need to add this experimental feature for AL23(nodeadm) as well?

@ndbaker1
Copy link
Member

ndbaker1 commented Jul 27, 2024

The original PR has enough context, #1549. The change LGTM. My question for this would be do we need to add this experimental feature for AL23(nodeadm) as well?

We should be able to pass anything that setup-local-disks accepts via spec.instance.localStorage.strategy so i think it should work OOTB. A doc update there would be appreciated though!

@lassizci
Copy link
Contributor Author

The original PR has enough context, #1549. The change LGTM. My question for this would be do we need to add this experimental feature for AL23(nodeadm) as well?

We should be able to pass anything that setup-local-disks accepts via spec.instance.localStorage.strategy so i think it should work OOTB. A doc update there would be appreciated though!

Added a documentation update. A question regarding the CRD though: the validation changes, but the versioning isn't changed. Would that actually need v1alpha2 or what's the policy here?

@cartermckinnon
Copy link
Member

the versioning isn't changed. Would that actually need v1alpha2 or what's the policy here?

We only bump the version on breaking changes, this is purely additive so we're fine to stay on v1alpha1.

@lassizci
Copy link
Contributor Author

lassizci commented Jan 20, 2025

Are there any further things needed to get the raid10 support?

@cartermckinnon
Copy link
Member

@lassizci thanks for your work on this! Giving the CI a final run before we merge.

/ci

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.25 / al2success ✅success ✅
1.25 / al2023failure ❌skipped ⏭️
1.26 / al2success ✅success ✅
1.26 / al2023failure ❌skipped ⏭️
1.27 / al2success ✅success ✅
1.27 / al2023failure ❌skipped ⏭️
1.28 / al2success ✅success ✅
1.28 / al2023failure ❌skipped ⏭️
1.29 / al2success ✅success ✅
1.29 / al2023failure ❌skipped ⏭️
1.30 / al2success ✅success ✅
1.30 / al2023failure ❌skipped ⏭️
1.31 / al2success ✅success ✅
1.31 / al2023failure ❌skipped ⏭️
1.32 / al2success ✅success ✅
1.32 / al2023failure ❌skipped ⏭️

@lassizci
Copy link
Contributor Author

lassizci commented Feb 21, 2025

@lassizci thanks for your work on this! Giving the CI a final run before we merge.

/ci

Thanks for coming back! Ok, it seems gotta still dive in to the build more, since they miserably fail for any AL2023. Sorry about that! Should be able to set up a test account to make sure.

type LocalStorageStrategy string

const (
// LocalStorageRAID0 will create a single raid0 volume from any local disks
LocalStorageRAID0 LocalStorageStrategy = "RAID0"

// LocalStorageRAID10 will create a single raid10 volume from any local disks. Minimum of 4.
LocalStorageRAID0 LocalStorageStrategy = "RAID10"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a typo and i think we're good :)

Suggested change
LocalStorageRAID0 LocalStorageStrategy = "RAID10"
LocalStorageRAID10 LocalStorageStrategy = "RAID10"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, indeed, thanks! Fixed now.

This removes the wait block for raid resync for two reasons:
1) raid0 does not have redundancy and therefore no initial resync[1]
2) with raid10 the resync time for 4x 1.9TB disks takes from tens of minutes
   to multiple hours, depending on sysctl params `dev.raid.speed_limit_min` and
   `dev.raid.speed_limit_max` and the speed of the disks. Initial resync for
   raid10 is not strictly needed[1]

Filesystem creation: by default `mkfs.xfs` attempts to TRIM the drive. This is
also something that can take tens of minutes or hours, depening on the size of
drives. TRIM can be skipped, as instances are delivered with disks fully
trimmed[2].

[1] https://raid.wiki.kernel.org/index.php/Initial_Array_Creation
[2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#InstanceStoreTrimSupport
@ndbaker1
Copy link
Member

/ci

Copy link
Contributor

@ndbaker1 roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@ndbaker1 the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.25 / al2success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2success ✅success ✅
1.26 / al2023success ✅success ✅
1.27 / al2success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2success ✅success ✅
1.29 / al2023success ✅success ✅
1.30 / al2success ✅success ✅
1.30 / al2023success ✅success ✅
1.31 / al2success ✅success ✅
1.31 / al2023success ✅success ✅
1.32 / al2success ✅success ✅
1.32 / al2023success ✅success ✅

@ndbaker1 ndbaker1 merged commit faaa555 into awslabs:main Feb 25, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants