Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi: add testing support for openmpi5 #16

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

wihobbs
Copy link
Member

@wihobbs wihobbs commented Feb 15, 2024

Problem: flux-pmix is required by openmpi5, which means we should be testing openmpi5 on LC hardware.

Utilize modulefiles on Corona to test this. Since loading new modulefiles can cause unexpected behavior in tests for openmpi4, only load those modulefiles when openmpi5 is the requested MPI to test.

Comment on lines 17 to 21
if [[ $MPI == "openmpi5" ]]; then
module use -q /usr/global/tools/mpi/toss_4_x86_64_ib/modulefiles/Core/
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be a comment here about why this module use line is needed.

Should it also be a fatal error if module use fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think module use throws any errors, or, at least, it's very easily fooled. However, if the module isn't loaded, openmpi/5 won't be found and that will kill the script. (lines 20-21)

(s=125,d=0) corona82 ~ $ mkdir junkdir
(s=125,d=0) corona82 ~ $ module use /g/g0/hobbs17/junkdir/
(s=125,d=0) corona82 ~ $ vim junkdir/hello.txt
(s=125,d=0) corona82 ~ $ module use /g/g0/hobbs17/junkdir/
(s=125,d=0) corona82 ~ $ module use /g/g0/hobbs17/junkdir/hello.txt
(s=125,d=0) corona82 ~ $ $?
bash: 0: command not found...

export NAME="$COMPILER"_"$MPI"

test -n $COMPILER || die "COMPILER (argument 1) not set"
test -n $MPI || die "MPI (argument 2) not set"
if [[ $MPI == "openmpi5" ]]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to avoid removing the first / from the MPI name and instead just have

if [[ $MPI == "openmpi/5" ]]; then

This would also address the comment below regarding module load $2

module load $COMPILER || die "Compiler $COMPILER is unavailable on $LCSCHEDCLUSTER"
module load $MPI || die "MPI implementation $MPI is unavailable on $LCSCHEDCLUSTER"
module load $2 || die "MPI implementation $MPI is unavailable on $LCSCHEDCLUSTER"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion would be to avoid using the shell position parameters in this part of the script (just for your future sanity).

Maybe use

MPI_MODULENAME=$2
MPI=${MPI_MODULENAME///} # bash-specific

So $MPI_MODULENAME contains the MPI module name and MPI contains a string with / stripped from the name.

However, I don't think this is necessary at this point (see above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's kinda necessary. The module name with a / in it can't be used in line 25 to create a directory (it'll end up trying to create, for example, a subdirectory 5 under clang_openmpi and fail.)

However, I see now on line 13 I'm creating another variable called name for the compiler_mpi name. Duh! I could just sed that, and keep things cleaner later. Fixing.

Problem: flux-pmix is required by openmpi5, which means
we should be testing openmpi5 on LC hardware.

Utilize modulefiles on Corona to test this. Since loading new
modulefiles can cause unexpected behavior in tests for openmpi4,
only load those modulefiles when openmpi5 is the requested MPI
to test.
Copy link

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@wihobbs
Copy link
Member Author

wihobbs commented Feb 15, 2024

Thanks @grondo!

@wihobbs wihobbs merged commit 81855f2 into flux-framework:main Feb 15, 2024
1 check passed
@wihobbs wihobbs deleted the openmpi5 branch February 15, 2024 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants