Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the gdas.cd hash and enable GDASApp to run on WCOSS2 #3220

Merged

Conversation

RussTreadon-NOAA
Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA commented Jan 10, 2025

Description

This PR does the following:

  1. update the sorc/gdas.cd hash to bring new GDASApp functionality into g-w
  2. update env/WCOSS2.env
  3. update the WCOSS2 section of ush/module-setup.sh

The change to WCOSS2.env is due to changes introduced during the fall 2024 WCOSS2 upgrade. The change to module-setup.sh is required when using spack-stack on WCOSS2.

Resolves #3219
Resolves #3100

Type of change

  • Maintenance (update gdas.cd hash)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? YES
    • GDAS - this PR points at the updated sorc/gdas.cd hash. No PRs are pending.

How has this been tested?

  • Clone and build on WCOSS2, Hera, Hercules, and Orion
  • Run g-w CI on WCOSS2, Hera, Hercules, and Orion

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • New and existing tests pass with my changes

@RussTreadon-NOAA
Copy link
Contributor Author

This PR is opened in draft mode until g-w CI has been run on WCOSS2, Hera, Hercules, and Orion.

The g-w team is invited to review and comment on changes to env/WCOSS2.env and ush/module-setup.sh. The changes in env/WCOSS2.env originate from the discussion in WCOSS Ticket#2024111410000051.

@aerorahul
Copy link
Contributor

No issues here with the hash update.
I am not sure we are cleared to use spack-stack on WCOSS2 for regular development. The installed stack was for demonstration and testing purposes for NCO staff.

env/WCOSS2.env Outdated
Comment on lines 16 to 19
# Add path to GDASApp libraries
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${HOMEgfs}/sorc/gdas.cd/build/lib"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/cray/pe/mpich/8.1.19/ofi/intel/19.0/lib"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but will not be acceptable for implementation. I hope there is a more robust solution than this by that time.

More importantly, this has an impact on every executable in every job -- not just GDASApp executables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree. I strongly dislike these two lines. They are temporary patches to allow GFS v17 testing and development to continue on WCOSS2.

The line

export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${HOMEgfs}/sorc/gdas.cd/build/lib"

was added because craype/2.7.17 adds

-static-libgcc -static-libstdc++ -Bstatic -lstdc++ -Bdynamic -lm -lpthread

to the ftn command. GDASApp executables failed because they could not find JEDI libraries. Might the addition of a GDASApp install option (something we must have) resolve this problem?

Another concern with the added ftn options is the following warning found in build_gdas.log

icpc: warning #10315: specifying -lm before files may supersede the Intel(R) math library and affect performance
ifort: warning #10315: specifying -lm before files may supersede the Intel(R) math library and affect performance

It would be unfortunate if default compiler options resulted in degraded code performance.

The line

export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/cray/pe/mpich/8.1.19/ofi/intel/19.0/lib"

was recommended by GDIT. GDASApp testing identified inconsistencies in across system modules. Some GDASApp executables failed with undefined symbol messages for mpi routines. GDIT is working on a solution.

@RussTreadon-NOAA RussTreadon-NOAA mentioned this pull request Jan 11, 2025
2 tasks
@RussTreadon-NOAA
Copy link
Contributor Author

g-w CI

RussTreadon-NOAA:feature/update_gdas at 4e73a31 installed on WCOSS2 (Cactus), Hera, Hercules, and Orion. The following g-w CI streams were run on each machine

  • C48_ATM
  • C48_S2SWA_gefs
  • C48_S2SW
  • C48mx500_3DVarAOWCDA
  • C48mx500_hybAOWCDA
  • C96C48_hybatmDA
  • C96C48_hybatmaerosnowDA
  • C96C48_ufs_hybatmDA
  • C96_S2SWA_gefs_replay_ics
  • C96_atm3DVar

All jobs in all streams successfully ran to completion on all machines except for C48mx500_hybAOWCDA job gdas_marineanlletkf on Hercules. Jobs from all other stream successfully ran to completion on Hercules. Only a single job from C48mx500_hybAOWCDA failed.

Investigation of the failure indicates a missing job dependency in the experiment xml. A rewind and rerun resulted in successful completion of the job and, as a result, the entireC48mx500_hybAOWCDA stream. @guillaumevernieres and @AndrewEichmann-NOAA have been contacted.

See issue #3219 for additional details.

@RussTreadon-NOAA RussTreadon-NOAA marked this pull request as ready for review January 13, 2025 11:20
@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Jan 13, 2025
@RussTreadon-NOAA
Copy link
Contributor Author

This PR is ready for review.

aerorahul
aerorahul previously approved these changes Jan 13, 2025
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the comments on LD_LIBRARY_PATH affecting non-gdasapp executables.
Acknowledging that gdasapp is now using space-stack on wcoss but other components are not.

@RussTreadon-NOAA
Copy link
Contributor Author

@aerorahul , incorporating @guillaumevernieres suggestion dismissed your approval.

@RussTreadon-NOAA
Copy link
Contributor Author

g-w issue #3222 has been opened to report the missing job dependency for C48mx500_hybAOWCDA gdas_marineanlletkf. @AndrewEichmann-NOAA will follow up on this item.

@RussTreadon-NOAA
Copy link
Contributor Author

NCO confirmed that it is OK for GDASApp to use /apps/ops/test/spack-stack-1.6.0-nco/envs/nco-intel-19.1.3.304/install/modulefiles/Core with the understanding that it is being worked on actively for new libraries or new versions, it might change from time to time.

/apps/ops/test/spack-stack-nco/modulefiles/Core was shared as a more stable version. Tests of this version show that it does not work in GDASApp. The GDASApp build fails with

-- [bufr_query] (2.8.0)
-- Feature TESTS enabled
CMake Error at bufr-query/CMakeLists.txt:23 (find_package):
  By not providing "Findeckit.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "eckit", but
  CMake did not find one.

  Could not find a package configuration file provided by "eckit" (requested
  version 1.23.0) with any of the following names:

We can not build GDASApp using /apps/dev/lmodules/core as we did in the past. Attempts to do so fail with the oops configure error

-- Adding bundle project oops
CMake Error at oops/CMakeLists.txt:14 (cmake_minimum_required):
  CMake 3.23 or higher is required.  You are running version 3.20.2


-- Configuring incomplete, errors occurred!

cmake/3.23 is not available with hpc-stack.

We will stick with /apps/ops/test/spack-stack-1.6.0-nco/envs/nco-intel-19.1.3.304/install/modulefiles/Core for the time being. This allows GFS v17 aerosol, snow, and marine DA development to continue on WCOSS2.

@RussTreadon-NOAA
Copy link
Contributor Author

g-w CI summary

CI-Hera-Passed, CI-Orion-Passed, CI-Wcoss2-Passed labels can be applied to this PR. Tests were manually run.

CI-Hercules-Passed applies to all cases except C48mx500_hybAOWCDA. Testing discovered a missing job dependency for marine_gdasanlletkf. g-w issue #3222 has been opened to track resolution of this problem. All other g-w CI cases passed on Hercules.

@RussTreadon-NOAA RussTreadon-NOAA added CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully labels Jan 13, 2025
@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 26fb850 into NOAA-EMC:develop Jan 14, 2025
5 checks passed
@RussTreadon-NOAA RussTreadon-NOAA deleted the feature/update_gdas branch January 14, 2025 11:08
@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @WalterKolczynski-NOAA

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this pull request Jan 15, 2025
…kf_sfc_update_com_in_out

* upstream/develop:
  Resolve bug with LMOD_TMOD_FIND_FIRST setting affecting build on WCOSS2 (NOAA-EMC#3229)
  Reinstate product groups (NOAA-EMC#3208)
  Additional fixes for downstream jobs (NOAA-EMC#3187)
  Turn IAU off during staging job for cold start experiments (NOAA-EMC#3215)
  Update the gdas.cd hash and enable GDASApp to run on WCOSS2 (NOAA-EMC#3220)
  Update upload-artifact to v4 (NOAA-EMC#3216)
  Prevent duplicate case generation in generate_workflows.sh (NOAA-EMC#3217)
  Update g-w to cycle with C1152 ATM (NOAA-EMC#3206)
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this pull request Jan 15, 2025
…kf_sfc_update_com_in_out

* upstream/develop:
  Resolve bug with LMOD_TMOD_FIND_FIRST setting affecting build on WCOSS2 (NOAA-EMC#3229)
  Reinstate product groups (NOAA-EMC#3208)
  Additional fixes for downstream jobs (NOAA-EMC#3187)
  Turn IAU off during staging job for cold start experiments (NOAA-EMC#3215)
  Update the gdas.cd hash and enable GDASApp to run on WCOSS2 (NOAA-EMC#3220)
  Update upload-artifact to v4 (NOAA-EMC#3216)
  Prevent duplicate case generation in generate_workflows.sh (NOAA-EMC#3217)
  Update g-w to cycle with C1152 ATM (NOAA-EMC#3206)
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this pull request Jan 15, 2025
…kf_sfc_update_com_in_out

* upstream/develop:
  Resolve bug with LMOD_TMOD_FIND_FIRST setting affecting build on WCOSS2 (NOAA-EMC#3229)
  Reinstate product groups (NOAA-EMC#3208)
  Additional fixes for downstream jobs (NOAA-EMC#3187)
  Turn IAU off during staging job for cold start experiments (NOAA-EMC#3215)
  Update the gdas.cd hash and enable GDASApp to run on WCOSS2 (NOAA-EMC#3220)
  Update upload-artifact to v4 (NOAA-EMC#3216)
  Prevent duplicate case generation in generate_workflows.sh (NOAA-EMC#3217)
  Update g-w to cycle with C1152 ATM (NOAA-EMC#3206)
tsga added a commit to tsga/global-workflow that referenced this pull request Jan 22, 2025
* develop:
  Add echgres as a dependency only for RUN=enkfgdas, not enkfgfs (NOAA-EMC#3246)
  Add domain level to wave gridded COM path (NOAA-EMC#3137)
  CI JJOB Tests using CMake (NOAA-EMC#3214)
  Make assorted updates to waves (NOAA-EMC#3190)
  Move WCOSS2 LD_LIBRARY_PATH patches to load_ufsda_modules.sh (NOAA-EMC#3236)
  Adding a gefs_arch task to GEFS workflow (NOAA-EMC#3211)
  Add additional GEFS variables needed for AI/ML applications  (NOAA-EMC#3221)
  Add bmat task dependency to marine LETKF task (NOAA-EMC#3224)
  Resolve bug with LMOD_TMOD_FIND_FIRST setting affecting build on WCOSS2 (NOAA-EMC#3229)
  Reinstate product groups (NOAA-EMC#3208)
  Additional fixes for downstream jobs (NOAA-EMC#3187)
  Turn IAU off during staging job for cold start experiments (NOAA-EMC#3215)
  Update the gdas.cd hash and enable GDASApp to run on WCOSS2 (NOAA-EMC#3220)
  Update upload-artifact to v4 (NOAA-EMC#3216)
  Prevent duplicate case generation in generate_workflows.sh (NOAA-EMC#3217)
  Update g-w to cycle with C1152 ATM (NOAA-EMC#3206)
  Separate use of initial increment/perturbation file from REPLAY/+03 ICs  (NOAA-EMC#3119)
  Update gsi_enkf hash and gsi_ver (NOAA-EMC#3207)
  Remove cpus-per-task from APRUN_OCNANALECEN on WCOSS2 (NOAA-EMC#3212)
  Remove 5WAVH from AWIPS GRIB2 parm files (NOAA-EMC#3146)
  Remove multi-grid wave support (NOAA-EMC#3188)
  Add echgres as a dependency for earc (NOAA-EMC#3202)
danholdaway added a commit to danholdaway/global-workflow that referenced this pull request Jan 27, 2025
* develop:
  Remove WAFS files and references from `develop` (NOAA-EMC#3263)
  fix intel stack version number on c5 (NOAA-EMC#3258)
  Update gsi_monitor and ufs_utils hashes to recent hashes for C5/C6 build and run (NOAA-EMC#3252)
  Enable DA cycling on gaea C5/C6 (NOAA-EMC#3255)
  Copy post-processed sea ice increment for diagnostics (NOAA-EMC#3235)
  Only run METplus in the 3Dvar tests (NOAA-EMC#3245)
  Clone, build, and run C48_ATM and C48_S2SW on Gaea C5 and C6 (NOAA-EMC#3106)
  Add echgres as a dependency only for RUN=enkfgdas, not enkfgfs (NOAA-EMC#3246)
  Add domain level to wave gridded COM path (NOAA-EMC#3137)
  CI JJOB Tests using CMake (NOAA-EMC#3214)
  Make assorted updates to waves (NOAA-EMC#3190)
  Move WCOSS2 LD_LIBRARY_PATH patches to load_ufsda_modules.sh (NOAA-EMC#3236)
  Adding a gefs_arch task to GEFS workflow (NOAA-EMC#3211)
  Add additional GEFS variables needed for AI/ML applications  (NOAA-EMC#3221)
  Add bmat task dependency to marine LETKF task (NOAA-EMC#3224)
  Resolve bug with LMOD_TMOD_FIND_FIRST setting affecting build on WCOSS2 (NOAA-EMC#3229)
  Reinstate product groups (NOAA-EMC#3208)
  Additional fixes for downstream jobs (NOAA-EMC#3187)
  Turn IAU off during staging job for cold start experiments (NOAA-EMC#3215)
  Update the gdas.cd hash and enable GDASApp to run on WCOSS2 (NOAA-EMC#3220)
  Update upload-artifact to v4 (NOAA-EMC#3216)
  Prevent duplicate case generation in generate_workflows.sh (NOAA-EMC#3217)
  Update g-w to cycle with C1152 ATM (NOAA-EMC#3206)
  Separate use of initial increment/perturbation file from REPLAY/+03 ICs  (NOAA-EMC#3119)
  Update gsi_enkf hash and gsi_ver (NOAA-EMC#3207)
  Remove cpus-per-task from APRUN_OCNANALECEN on WCOSS2 (NOAA-EMC#3212)
  Remove 5WAVH from AWIPS GRIB2 parm files (NOAA-EMC#3146)
  Remove multi-grid wave support (NOAA-EMC#3188)
  Add echgres as a dependency for earc (NOAA-EMC#3202)
  Ensure OCNRES and ICERES have 3 digits in the archive script (NOAA-EMC#3199)
  Set runtime shell requirements within Jenkins Pipeline (NOAA-EMC#3171)
  Add efcs and epos to ufs_hybatm xml (NOAA-EMC#3192) (NOAA-EMC#3193)
  Fix GEFS and SFS compile flags in build_all.sh (NOAA-EMC#3197)
  Remove early-cycle EnKF forecast (NOAA-EMC#3185)
  Fix mod_icec bug in atmos_prod (NOAA-EMC#3167)
  Create compute build option (NOAA-EMC#3186)
  Support global-workflow using Rocky 8 on CSPs (NOAA-EMC#2998)
  Change orog gravity wave drag scheme for grid sizes less than 10km (NOAA-EMC#3175)
  Switch snow DA to use 2DVar for deterministic and ensemble mean (NOAA-EMC#3163)
  Update compression options for GEFS history files (NOAA-EMC#3184)
  Update compression options for high res history files (NOAA-EMC#3178)
  Turn DO_TEST_MODE off (NOAA-EMC#3177)
  Hotfix for gdas_arch div/0 (NOAA-EMC#3169)
  Allow building of the ufs-weather-model, WW3 pre/post execs for GFS, GEFS, SFS in the same clone of global-workflow (NOAA-EMC#3098)
  Switch Aerosol DA to use JCB and Jedi class (NOAA-EMC#3125)
  Update ufs-weather-model to 2024-12-06 commit  (NOAA-EMC#3145)
  Enable traditional threading as an option (NOAA-EMC#3149)
  Update HPC_ACCOUNT on Hercules to fv3-cpu (NOAA-EMC#3164)
  Turn C96C48_ufs_hybatmDA and C48mx500_3DVarAOWCDA into a regression test (NOAA-EMC#3120)
  Update GSI analysis jobs to use COMIN/COMOUT (NOAA-EMC#3092)
  Update HPC Tier Definitions (NOAA-EMC#3138)
  Add marine hybrid envar (NOAA-EMC#3041)
  Archive the experiment directory along with git status/diff output (NOAA-EMC#3105)
  Use stochastic restart patterns on rerun (NOAA-EMC#3077)
  Point Jenkinsfile back to CI/ (NOAA-EMC#3139)
  Fix wave restart for cold start and add ic version file (NOAA-EMC#3112)
  Allow users to override the default account at setup time (NOAA-EMC#3127)
  Refactor gridded wave post (NOAA-EMC#3014)
  Update docs related to NOAA CSPs (NOAA-EMC#3043)
  Allow APP to differ between RUNs (NOAA-EMC#2943)
  Run one executable for soca2cice (instead of two) (NOAA-EMC#3118)
  Speed up GSI analysis jobs in CI testing (NOAA-EMC#3115)
  Make aerosol output frequency variable (NOAA-EMC#2982)
  Add new stations to GFS BUFR sounding products (NOAA-EMC#3107)
  JCB-based obs+bias staging, Jedi class updates, and marine B-matrix refactoring (NOAA-EMC#2992)
  Enable tapering of atm ens perts at the model top (NOAA-EMC#3097)
  Update JGDAS ENKF POST  job  (NOAA-EMC#3090)
  SFS Runs at C96mx100  (NOAA-EMC#2960)
  Move machine-based options from config.base to host files (NOAA-EMC#3053)
  Remove RUNDIRS before running CI cases to cover re-run events (NOAA-EMC#3076)
  CI GitHub pipeline (hotfix) update for fetching repo name (NOAA-EMC#3084)
  Update JGDAS ENKF ECEN job  (NOAA-EMC#3050)
  Update snow obs processing job (NOAA-EMC#3055)
  Update to action workflow pipeline in default repo for development  (NOAA-EMC#3062)
  Update to action workflow pipeline in default repo for development (NOAA-EMC#3061)
  Update workflow pipeline (NOAA-EMC#3060)
  PW CI pipeline update5 ready for review so it can be merged and tested (NOAA-EMC#3059)
  Revert "GitHub CI Pipeline update for debugging forked PR support" (NOAA-EMC#3057)
  GitHub CI Pipeline update for debugging forked PR support (NOAA-EMC#3056)
  Add more ocean variables for post-processing in GEFS (NOAA-EMC#2995)
  Auto provisioning of PW clusters from GitHub CI added (NOAA-EMC#3051)
  Fix the name of the TC tracker filenames in archive.py (NOAA-EMC#3030)
  Make wxflow links static instead of from link_workflow (NOAA-EMC#3008)
  Update global jdas enkf diag job with COMIN/COMOUT for COM prefix (NOAA-EMC#2959)
  Add run and finalize methods to marine LETKF task (NOAA-EMC#2944)
  Fix wave restarts and GEFS FHOUT/FHMAX (NOAA-EMC#3009)
  Disabling hyper-threading (NOAA-EMC#2965)
  GitHub Actions Pipeline Updates for Self-Hosted Runners on PW (NOAA-EMC#3018)
  CI jekninsfile update hotfix (NOAA-EMC#3038)
  Update gdas.cd (NOAA-EMC#2978)
  Add ability to add tag to pslots with generate_workflows (NOAA-EMC#3036)
  CI update to shell environment with HOMEgfs to HOME_GFS for systems that need the path (NOAA-EMC#3013)
  Quick updated to Jenkins (health check) launch script (NOAA-EMC#3033)
  Document the generate_workflows.sh script (NOAA-EMC#3028)
  Replace gfs_cyc with an interval (NOAA-EMC#2928)
  Hotfix: Fix generate_workflows.sh optional build flags (NOAA-EMC#3024)
  Add a tool to run multiple YAML cases locally (NOAA-EMC#3004)
  Hotfix: Correctly set overwrite option when specified (NOAA-EMC#3021)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update sorc/gdas.cd hash Unable to build GDASApp on Cactus following the system upgrade
5 participants