Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts to run containerized model outside of FRE #136

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions ci/NWA/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Author: Tom Robinson
FROM intel/hpckit:2024.2.1-0-devel-ubuntu22.04 as builder
############ Set up build environment ############
## Clone spack
RUN mkdir -p /opt && cd /opt && git clone -b v0.22.2 https://github.com/spack/spack.git
# What we want to install and how we want to install it
# is specified in a manifest file (spack.yaml)
RUN mkdir -p /opt/spack-environment && \
set -o noclobber \
&& (echo spack: \
&& echo ' mirrors:'\
&& echo ' E4S: https://cache.e4s.io/noaa'\
&& echo ' definitions:' \
&& echo ' - packages_builtin:' \
&& echo ' - bacio%oneapi@2024.2.1' \
&& echo ' - hdf5@1.14.3%oneapi@2024.2.1' \
&& echo ' - ip%oneapi@2024.2.1' \
&& echo ' - libyaml@0.2.5%oneapi@2024.2.1' \
&& echo ' - nccmp@1.9.1.0%oneapi@2024.2.1' \
&& echo ' - netcdf-c@4.9.2%oneapi@2024.2.1' \
&& echo ' - netcdf-fortran@4.6.1%oneapi@2024.2.1' \
&& echo ' - sp@2.3.3%oneapi@2024.2.1' \
&& echo ' - w3emc@2.11.0' \
&& echo ' - w3nco@2.4.1' \
&& echo ' - zlib%oneapi@2024.2.1' \
&& echo ' - zlib-ng@2.1.4%oneapi@2024.2.1' \
&& echo ' packages:' \
&& echo ' intel-oneapi-mpi:' \
&& echo ' buildable: false' \
&& echo ' externals:' \
&& echo ' - spec: intel-oneapi-mpi@2021.10.0' \
&& echo ' path: /opt/intel/oneapi/mpi/2021.10.0' \
&& echo ' mpi:' \
&& echo ' require: intel-oneapi-mpi' \
&& echo ' hdf5:' \
&& echo ' variants: +fortran+hl+szip' \
&& echo ' netcdf-c:' \
&& echo ' variants: +dap' \
&& echo ' pango:' \
&& echo ' variants: +X' \
&& echo ' all:' \
&& echo ' target: [x86_64]' \
&& echo ' providers:' \
&& echo ' zlib-api: [zlib-ng+compat, zlib]' \
&& echo ' compiler: [oneapi]' \
&& echo ' specs:' \
&& echo ' - matrix:' \
&& echo ' - [$packages_builtin]' \
&& echo ' concretizer:' \
&& echo ' unify: True' \
&& echo ' config:' \
&& echo ' install_tree: /opt/software' \
&& echo ' view: /opt/views/view') > /opt/spack-environment/spack.yaml
# Install the software, remove unnecessary deps
RUN . /opt/spack/share/spack/setup-env.sh && cd /opt/spack-environment && spack compiler add
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated line

RUN . /opt/spack/share/spack/setup-env.sh && cd /opt/spack-environment && spack compiler add && spack --verbose env activate . && spack --verbose install --fail-fast && spack gc -y
## Set environment variables
ENV LD_LIBRARY_PATH /opt/views/view/lib:$LD_LIBRARY_PATH
ENV LIBRARY_PATH /opt/views/view/lib:$LIBRARY_PATH
ENV PATH $PATH:/opt/views/view/bin
############ Set up build ############
## Set up code checkout
RUN export GIT_TERMINAL_PROMPT=0 && \
mkdir -p /apps/mom6_sis2_generic_4p_compile_symm_yaml/src && \
cd /apps/mom6_sis2_generic_4p_compile_symm_yaml/src && \
git clone --recursive --jobs=4 https://github.com/NOAA-GFDL/FMS.git -b 2024.01.02 FMS && \
git clone --recursive --jobs=4 https://github.com/NOAA-GFDL/CEFI-regional-MOM6.git -b main mom6 && \
git clone --recursive --jobs=4 https://github.com/NOAA-GFDL/ice_param.git sis2 && \
git clone --recursive --jobs=4 https://github.com/NOAA-GFDL/land_null.git land_null && \
git clone --recursive --jobs=4 https://github.com/NOAA-GFDL/atmos_null.git atmos_null && \
git clone --recursive --jobs=4 https://github.com/NOAA-GFDL/FMScoupler.git coupler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the CEFI-regional-MOM6 main repo contains all the sub-components needed to build the mom6-sis2-cobalt model, we do not need to clone the sub-components individually. Consider removing them to ensure users build the model using the recommended tag for each sub-component.

## Clone mkmf
RUN cd /apps \
&& git clone --recursive https://github.com/NOAA-GFDL/mkmf \
&& cp mkmf/bin/* /usr/local/bin
## Create the build directory
RUN mkdir -p /apps/mom6_sis2_generic_4p_compile_symm_yaml/exec
## Use mkmf to create the Makefiles for the componenets
RUN bld_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& src_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/src \
&& mkmf_template=/apps/mkmf/templates/hpcme-intel24.mk \
&& mkdir -p $bld_dir/FMS \
&& list_paths -l -o $bld_dir/FMS/pathnames_FMS $src_dir/FMS \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to $src_dir/mom6/src/FMS to ensure the build process uses the recommended FMS tag from CEFI-regional-MOM6.

&& cd $bld_dir/FMS \
&& mkmf -m Makefile -a $src_dir -b $bld_dir -p libFMS.a -t $mkmf_template -c " -DINTERNAL_FILE_NML -g -Duse_libMPI -Duse_netCDF -Duse_yaml -DMAXFIELDMETHODS_=600" -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include $bld_dir/FMS/pathnames_FMS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please consider changing the FMS-related include folder to -Imom6/src/FMS/fms2_io/include -Imom6/src/FMS/include -Imom6/src/FMS/mpp/include. You will probably also need to make similar changes for the other components listed below.

RUN bld_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& src_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/src \
&& mkmf_template=/apps/mkmf/templates/hpcme-intel24.mk \
&& mkdir -p $bld_dir/mom6 \
&& list_paths -l -o $bld_dir/mom6/pathnames_mom6 $src_dir/mom6/src/MOM6/config_src/memory/dynamic_symmetric $src_dir/mom6/src/MOM6/config_src/drivers/FMS_cap $src_dir/mom6/src/MOM6/src/*/ $src_dir/mom6/src/MOM6/src/*/*/ $src_dir/mom6/src/MOM6/config_src/external/ODA_hooks $src_dir/mom6/src/MOM6/config_src/external/stochastic_physics $src_dir/mom6/src/MOM6/config_src/external/drifters $src_dir/mom6/src/MOM6/config_src/external/database_comms $src_dir/mom6/src/ocean_BGC/generic_tracers $src_dir/mom6/src/ocean_BGC/mocsy/src $src_dir/mom6/src/MOM6/pkg/GSW-Fortran/modules $src_dir/mom6/src/MOM6/pkg/GSW-Fortran/toolbox $src_dir/mom6/src/MOM6/config_src/infra/FMS2 \
&& cd $bld_dir/mom6 \
&& mkmf -m Makefile -a $src_dir -b $bld_dir -p libmom6.a -t $mkmf_template -c "-DINTERNAL_FILE_NML -g -DINTERNAL_FILE_NML -DMAX_FIELDS_=100 -DUSE_FMS2_IO -DNOT_SET_AFFINITY -D_USE_MOM6_DIAG -D_USE_GENERIC_TRACER -DUSE_PRECISION=2 " -o "-I$bld_dir/FMS " -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include -Imom6/src/MOM6/pkg/CVMix-src/include $bld_dir/mom6/pathnames_mom6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider increasing -DMAX_FIELDS_=100 to -DMAX_FIELDS_=600, as the BGC model may require a significant amount of field registry.

RUN bld_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& src_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/src \
&& mkmf_template=/apps/mkmf/templates/hpcme-intel24.mk \
&& mkdir -p $bld_dir/sis2 \
&& list_paths -l -o $bld_dir/sis2/pathnames_sis2 $src_dir/mom6/src/SIS2/config_src/dynamic_symmetric $src_dir/mom6/src/SIS2/config_src/external/Icepack_interfaces $src_dir/mom6/src/SIS2/src $src_dir/mom6/src/icebergs/src $src_dir/mom6/src/ice_param \
&& cd $bld_dir/sis2 \
&& mkmf -m Makefile -a $src_dir -b $bld_dir -p libsis2.a -t $mkmf_template -c "-DINTERNAL_FILE_NML -g -DUSE_FMS2_IO" -o "-I$bld_dir/FMS -I$bld_dir/mom6 " -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include -Imom6/src/MOM6/pkg/CVMix-src/include -Imom6/src/MOM6/src/framework $bld_dir/sis2/pathnames_sis2
RUN bld_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& src_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/src \
&& mkmf_template=/apps/mkmf/templates/hpcme-intel24.mk \
&& mkdir -p $bld_dir/land_null \
&& list_paths -l -o $bld_dir/land_null/pathnames_land_null $src_dir/mom6/src/land_null \
&& cd $bld_dir/land_null \
&& mkmf -m Makefile -a $src_dir -b $bld_dir -p libland_null.a -t $mkmf_template -c "" -o "-I$bld_dir/FMS " $bld_dir/land_null/pathnames_land_null
RUN bld_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& src_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/src \
&& mkmf_template=/apps/mkmf/templates/hpcme-intel24.mk \
&& mkdir -p $bld_dir/atmos_null \
&& list_paths -l -o $bld_dir/atmos_null/pathnames_atmos_null $src_dir/mom6/src/atmos_null \
&& cd $bld_dir/atmos_null \
&& mkmf -m Makefile -a $src_dir -b $bld_dir -p libatmos_null.a -t $mkmf_template -c "-DINTERNAL_FILE_NML -g" -o "-I$bld_dir/FMS " -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include -Imom6/src/MOM6/pkg/CVMix-src/include $bld_dir/atmos_null/pathnames_atmos_null
RUN bld_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& src_dir=/apps/mom6_sis2_generic_4p_compile_symm_yaml/src \
&& mkmf_template=/apps/mkmf/templates/hpcme-intel24.mk \
&& mkdir -p $bld_dir/coupler \
&& list_paths -l -o $bld_dir/coupler/pathnames_coupler $src_dir/mom6/src/coupler/shared $src_dir/mom6/src/coupler/full \
&& cd $bld_dir/coupler \
&& mkmf -m Makefile -a $src_dir -b $bld_dir -p libcoupler.a -t $mkmf_template -c "-DINTERNAL_FILE_NML -g -DUSE_FMS2_IO -D_USE_LEGACY_LAND_ -Duse_AM3_physics" -o "-I$bld_dir/FMS -I$bld_dir/mom6 -I$bld_dir/sis2 -I$bld_dir/land_null -I$bld_dir/atmos_null " -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include -Imom6/src/MOM6/pkg/CVMix-src/include $bld_dir/coupler/pathnames_coupler
## Create the main Makefile
RUN mkdir -p /apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& cd /apps/mom6_sis2_generic_4p_compile_symm_yaml/exec \
&& set -o noclobber \
&& (echo SRCROOT = /apps/mom6_sis2_generic_4p_compile_symm_yaml/src/ \
&& echo 'BUILDROOT = /apps/mom6_sis2_generic_4p_compile_symm_yaml/exec/' \
&& echo 'MK_TEMPLATE = /apps/mkmf/templates/hpcme-intel24.mk' \
&& echo 'include $(MK_TEMPLATE)' \
&& echo 'mom6_sis2_generic_4p_compile_symm_yaml.x: coupler/libcoupler.a sis2/libsis2.a mom6/libmom6.a land_null/libland_null.a atmos_null/libatmos_null.a FMS/libFMS.a ' \
&& echo '\t$(LD) $^ $(LDFLAGS) -o $@ $(STATIC_LIBS)' \
&& echo 'coupler/libcoupler.a: FMS/libFMS.a mom6/libmom6.a sis2/libsis2.a land_null/libland_null.a atmos_null/libatmos_null.a FORCE' \
&& echo '\t$(MAKE) SRCROOT=$(SRCROOT) BUILDROOT=$(BUILDROOT) MK_TEMPLATE=$(MK_TEMPLATE) --directory=coupler $(@F)' \
&& echo 'sis2/libsis2.a: FMS/libFMS.a mom6/libmom6.a FORCE' \
&& echo '\t$(MAKE) SRCROOT=$(SRCROOT) BUILDROOT=$(BUILDROOT) MK_TEMPLATE=$(MK_TEMPLATE) --directory=sis2 $(@F)' \
&& echo 'mom6/libmom6.a: FMS/libFMS.a FORCE' \
&& echo '\t$(MAKE) SRCROOT=$(SRCROOT) BUILDROOT=$(BUILDROOT) MK_TEMPLATE=$(MK_TEMPLATE) --directory=mom6 $(@F)' \
&& echo 'land_null/libland_null.a: FMS/libFMS.a FORCE' \
&& echo '\t$(MAKE) SRCROOT=$(SRCROOT) BUILDROOT=$(BUILDROOT) MK_TEMPLATE=$(MK_TEMPLATE) --directory=land_null $(@F)' \
&& echo 'atmos_null/libatmos_null.a: FMS/libFMS.a FORCE' \
&& echo '\t$(MAKE) SRCROOT=$(SRCROOT) BUILDROOT=$(BUILDROOT) MK_TEMPLATE=$(MK_TEMPLATE) --directory=atmos_null $(@F)' \
&& echo 'FMS/libFMS.a: FORCE' \
&& echo '\t$(MAKE) SRCROOT=$(SRCROOT) BUILDROOT=$(BUILDROOT) MK_TEMPLATE=$(MK_TEMPLATE) --directory=FMS $(@F)' \
&& echo 'FORCE:' \
&& echo '') > /apps/mom6_sis2_generic_4p_compile_symm_yaml/exec/Makefile
## Use make to build the model
RUN cd /apps/mom6_sis2_generic_4p_compile_symm_yaml/exec && make -j 4 PROD=on
############ Create the final stage of the container build ############
FROM intel/oneapi-runtime:2024.2.1-0-devel-ubuntu22.04 as final
## copy libs and executable from builder
COPY --from=builder /opt/software /opt/software
COPY --from=builder /opt/views /opt/views
# This will include the code and all of the build from the builder stage
COPY --from=builder /apps/mom6_sis2_generic_4p_compile_symm_yaml /apps/mom6_sis2_generic_4p_compile_symm_yaml
## Set up the run time environment
ENV LD_LIBRARY_PATH /opt/views/view/lib:$LD_LIBRARY_PATH
ENV LIBRARY_PATH /opt/views/view/lib:$LIBRARY_PATH
ENV PATH /opt/views/view/bin:/apps/mom6_sis2_generic_4p_compile_symm_yaml/exec:$PATH
ENTRYPOINT ["/bin/bash"]
Empty file added ci/NWA/INPUT/.gitkeep
Empty file.
Empty file added ci/NWA/OUTPUT/stdout/.gitkeep
Empty file.
37 changes: 37 additions & 0 deletions ci/NWA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Running Containerized Model Outside of FRE

## Introduction

This directory contains scripts to run the `CEFI_NWA12_COBALT_V1` experiment in a container outside of the `FRE` workflow. Since these scripts do not benefit from the years of development that have gone into `FRE`, it lacks several features and makes several assumptions:

1.) You will have to stage all the necessary input files to the `INPUT/` directory yourself, using the same naming scheme as `CEFI_NWA12_cobalt.xml`. All input files are available on gaea, and the `run_model.sh` script will stage annual `ERA5` and `GloFAS` runoff forcings for you if you provide a path to a directory where these files are located. You will have to manually move the other files your self. If on gaea, or a system with access to gaea, you can stage the necesarray in puts with the following commands:
```
cp /gpfs/f5/cefi/scratch/Utheri.Wagura/DockerfileTest/INPUT ./INPUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also make a copy of those inputs under global-shared folder: /gpfs/f5/icefi/world-shared/datasets/container_input/NWA and /gpfs/f6/ira-cefi/world-shared/datasets/container_input/NWA.

ln -s INPUT/ocean_topog.nc INPUT/topog.nc
```

2.) Model output will not be staged to another system at the end of each model year. When a year of simulation is complete, `run_model.sh` will simply tar together all output files and move them to a folder in `OUTPUT` named after the start date of that particular run. If the `mppnccombine` tool is available in your `path`, `run_model.sh` will try to combine outputs from different ranks before tarring the files togeter.

## Requirements
This workflow uses a containerized environment to compile and run the model. As such, to run the workflow as-is, you will need access to a system that has both `podman` and `apptainer/singularity` available. Since `podman` is a drop in replacement for `docker`, the `compile_model.sh` script should still work if you replace `podman` with `docker`, though this has not been tested. Similarly, if you do not have access to `apptainer/singularity`, you may be able to skip the `apptainer build` step and replace `apptainer exec` with `docker exec` in `run_model.sh`, although this has not been tested either.

Running the model requires either `Intel MPI` or `MPICH`. The environment file in `/envs/container-gaea.env` uses `lmod` to load `MPICH` for you and sets environment variables that `apptainer` needs. If running on a system other than gaea, you will need to create an environment file that sets the relevant variables and loads the relevant `MPI` implementation for you.

**IMPORTANT**: If running on a system other than gaea, be sure to edit the variables `era5_dir`, `glofas_dir`, and `env_file` to point to direcotories containing annual `ERA5` and `GloFAS` runoff data, as well as your environment file, before running the workflow.
**IMPORTANT**: We have encountered some issues building the container on some gaea nodes. So far, the model had compiled successfully on the following nodes:
```
gaea56
```
Please use one of these nodes to create the container until all compilation issues are resolved

## Running the workflow.

After staging all files to the `INPUT` directory, compile the model by calling `compile_model.sh`:
```
./compile_model.sh
```
This will create a `CEFI_NWA12_COBALT_V1.sif` file containing the model executable. Note that this process can take about an hour to complete. Once you have the `.sif` file, and have set all of the relevant variables at the top of the `run_model.sh` script, run the following command to run the model for `n` years:
```
sbatch ./run_model.sh n
```
The model output from each siumulation year will be available in `OUTPUT/YYYY0101` for year `YYYY`, while the `stdout` and `stderr` from the run will be available in `OUTPUT/stdout`.
5 changes: 5 additions & 0 deletions ci/NWA/compile_model.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
podman build -f Dockerfile -t mom6_sis2_generic_4p_compile_symm_yaml:prod
rm -f mom6_sis2_generic_4p_compile_symm_yaml.tar mom6_sis2_generic_4p_compile_symm_yaml.sif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied over from the fre script to make sure the sif file doesn't exist before writing it in the next step. I kept it as a precaution, but it can be removed

podman save -o mom6_sis2_generic_4p_compile_symm_yaml-prod.tar localhost/mom6_sis2_generic_4p_compile_symm_yaml:prod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like Podman—it's a great alternative to Docker, especially for Linux and HPC users. My concern is that regular users may simply want a Dockerfile or Singularity definition file to easily build an image for running the model. I don't have any issues with the run script, but perhaps we could consider adding a Singularity definition file as well, so users can just run apptainer build or singularity build to get the image they need. I'd be happy to provide a Singularity definition script that can do almost the same thing as the Dockerfile you've provided. What are your thoughts on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I can work on creating a singularity definition file as well to provide this capability

apptainer build --disable-cache CEFI_NWA12_COBALT_V1.sif docker-archive://mom6_sis2_generic_4p_compile_symm_yaml-prod.tar
Loading
Loading