-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move temporary charliecloud containers outside of process directory #5212
Conversation
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Any comments or thoughts on this PR? |
Any chance someone could review this and let me know if anything should be changed or done differently before this can be merged? @pditommaso @bentsherman |
docs/config.md
Outdated
: Enable `writeFake` with charliecloud. This allows to run containers from storage in writeable mode, using overlayfs, see [charliecloud documentation](https://hpc.github.io/charliecloud/ch-run.html#ch-run-overlay) for details. Using this is `writeFake`, given that the kernel supports it. If `writeFake` is not enabled, each process will run in an individual container, which can lead to the creation of many temporary containers for a pipeline with many containers. | ||
|
||
`charliecloud.containerStore` | ||
: If `writeFake` is enabled, this option has no effect. Directoy that stores temporary containers. Unless `writeFake` is enabled, each process runs in an individual container. If this is not set, nextflow will default to `work/charliecloud_store`. Path to temporary containers inside the `containerStore` is the full hash of the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: If `writeFake` is enabled, this option has no effect. Directoy that stores temporary containers. Unless `writeFake` is enabled, each process runs in an individual container. If this is not set, nextflow will default to `work/charliecloud_store`. Path to temporary containers inside the `containerStore` is the full hash of the process. | |
: If `writeFake` is enabled, this option has no effect. Directory that stores temporary containers. Unless `writeFake` is enabled, each process runs in an individual container. If this is not set, nextflow will default to `work/charliecloud_store`. Path to temporary containers inside the `containerStore` is the full hash of the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't you just reuse charliecloud.cacheDir
instead of adding yet another config option?
The singularity driver also has the concept of libraryDir
for images in read-only locations, might also be worth considering here to keep things more comparable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that charliecloud does not allow running writable from storage, so another directory seems necessary to store the temporary container. If cacheDir and $CH_IMAGE_STORAGE are identical charliecloud would refuse to run with -w
, unless I am misunderstanding something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-w
isn't actually needed when --write-fake
is provided, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not
docs/config.md
Outdated
|
||
`charliecloud.useSquash` | ||
: Create a temporary squashFS container image in the process work directory instead of a folder. | ||
: If `writeFake` is enabled, this option has no effect. Create a temporary squashFS container image in `containerStore` (see above) named `container.squashfs`. If `writeFake` is not available but squashFS is an option, this should be enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think useSquash
should be made the default, if not the only option.
Using plain directory trees and making a copy for each process execution can lead to insane amounts of inodes for larger runs. Sysadmins won't like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using squashfs images is possible, we should do it.
Could you be more specific about the meaning of "somewhat modern" systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also do not like this, and I am not sure what exactly is preventing this, but the infrastructure I am using does not support squashfs. I talked to the admins, but this will not change in the foreseeable future..
I agree that it should be made default. Maybe if squashfs is not available, only a single copy of the container could be created per process, instead of having one per task? This might be a bit unsafe, but maybe a good trade of over cluttering the filesystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating one full copy of the container filesystem per task is no sane behaviour, I think we agree on that.
Instead (from what I understand), you could pull the container once (eg. to charliecloud.cacheDir
, that could be the same as $CH_IMAGE_STORAGE
), and then use --write-fake
to spawn as many container instances as you'd like without having to copy the container filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, clearly using --write-fake
with a storage directory is the best choice and avoids all of the extra creation of containers.
This is not an option on older systems though (https://hpc.github.io/charliecloud/ch-run.html#writeable-overlay-with-write-fake), such as the one I am using. Since charliecloud will not run an image in the storage writable (-w
), a new image needs to be created somewhere that is not $CH_IMAGE_STORAGE
. cacheDir
may, or may not be the same directory as $CH_IMAGE_STORAGE
, so I guess that would need to be checked against? I will try to come to with a better way to handle the support for older systems. I agree that creating maybe thousands of container directories is not really a reasonable way to handle this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the additional issue that running ch-convert
at the beginning of every task takes some time (minutes), which is not great.
@@ -62,6 +65,9 @@ class CharliecloudBuilder extends ContainerBuilder<CharliecloudBuilder> { | |||
if( params.containsKey('readOnlyInputs') ) | |||
this.readOnlyInputs = params.readOnlyInputs?.toString() == 'true' | |||
|
|||
if( params.containsKey('containerStore') ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this is actually needed (see above)
.params(runOptions: '-j --no-home') | ||
.build() | ||
.runCommand == 'ch-convert -i ch-image --storage /cacheDir busybox "$NXF_TASK_WORKDIR"/container_busybox && ch-run --unset-env="*" -c "$NXF_TASK_WORKDIR" --set-env -w -b "$NXF_TASK_WORKDIR" -j --no-home "$NXF_TASK_WORKDIR"/container_busybox --' | ||
.runCommand == 'mkdir -p /containerStorage/"${NXF_TASK_WORKDIR: -34}" && ch-convert -i ch-image --storage /cacheDir busybox /containerStorage/"${NXF_TASK_WORKDIR: -34}" && ch-run --unset-env="*" -c "$NXF_TASK_WORKDIR" --set-env -w -b "$NXF_TASK_WORKDIR" -j --no-home /containerStorage/"${NXF_TASK_WORKDIR: -34}" --' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the magic default of -34
? I don't understand the purpose of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -34 will get the task hash to create containers per task.
I think this is worth reconsidering, is it really necessary to create one container per task, or would it be sufficient to have one dedicated container for each process?
Thanks @phue, I think the review largely confirms what I have outlined in the initial comment: Creating one container per task (as was initially suggested here) is not going to be a useful route for handling containers, especially if
I understand the idea of maintaining the purity of containers in the Do you have any thoughts or comments on this @phue ? |
I have updated this PR to follow what I have outlined above:
This greatly simplifies the container procedure for The commit messages that end early should include |
Just to recap why The next best thing to do imo would be something along these lines:
|
I agree that
This was how the cache was used up until my latest commit. Even with this commit
I have contemplated this before going with the current approach. The main problem is that this would require some additional parts in the Conversion of the container into a single "working copy" cannot easily be done inside the I should add that the current approach may lead to Issue 3964 coming up again.. |
I don't quite understand this.
The main difference is that this will populate a new local cache each time you start a fresh pipeline run. It may not be faster but certainly cheaper because using the global cache at least gives you some deduplication.
Then extend |
Yes it is. One difference is that
This mutex never worked properly and caused problems. See here #3566
You could use the same cache for different pipeline runs via |
So, here is the latest set of changes:
In summary:
|
I would also like to add something that may not be obvious (at least to me) regarding this
The charliecloud build cache appears to be located in
To test if the image storage directory is the same as Edit: |
To explain my reluctance to add some conversion step, besides the fact that it would be quite some work: This would only add support for cases where someone is not able to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that it defaults to using overlayfs
now.
The workaround that you implemented for older kernels isn't optimal, but it seems to be an improvement over the current state of things.
Left two comments below, but otherwise looks good to me.
Please note though that I currently don't have a good way to test this other than my laptop..
} else { | ||
// Otherwise run by path | ||
result << image | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have unit tests for both codepaths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree. I am not super familiar with the test framework though, is there a way to change the environment in the test framework, or would I need to construct something similar to how it is done in the container builder?
e23dc73
to
4c3725a
Compare
Once again, something went sideways when I tried to sign-off something.. I will try to solve whatever went wrong. |
c988e86
to
416a496
Compare
This adresses #5192.
Instead of placing the process-container inside the process work directory, it is created in a folder that can be defined via
charliecloud.containerStore
, or by default inwork/charliecloud_store
. This solves the problem offind
trying to work through the container.There are some issues that that remain:
I do not know if
nextflow clean
should removework/charliecloud_store
, but I would assume it does not.The whole approach of creating process specific containers comes with quite some storage overhead, since each process will have a dedicated container created, which can easily create hundreds to thousands of new containers during a single run. There are some work-arounds for this: On modern systems
ch-runs
's--write-fake
can be used, which skips process specific container creation and instead runs containers in the container storage. This is probably the best way. On somewhat modern systems,useSquash
can be enabled to create.squashfs
containers, which would only create a single file per container, instead of a whole directory tree. I think it would be nice to do a cleanup of the temporary (process) container after the process finished, but I am not sure if / how this can be implemented via the charliecloud driver.