Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for the propagation of queue_options from ertconfig to cluster #10147

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jonathan-eq
Copy link
Contributor

@jonathan-eq jonathan-eq commented Feb 25, 2025

Issue
Resolves #10141

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'just rapid-tests')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@jonathan-eq jonathan-eq added the release-notes:misc Automatically categorise as miscellaneous change in release notes label Feb 25, 2025
Copy link

codspeed-hq bot commented Feb 25, 2025

CodSpeed Performance Report

Merging #10147 will not alter performance

Comparing jonathan-eq:fix-tests (9a8ec0e) with main (3d3894a)

Summary

✅ 25 untouched benchmarks

@jonathan-eq jonathan-eq force-pushed the fix-tests branch 2 times, most recently from 9e9fb21 to 3c87d49 Compare February 25, 2025 15:49
@jonathan-eq jonathan-eq marked this pull request as ready for review February 25, 2025 15:55
@jonathan-eq jonathan-eq force-pushed the fix-tests branch 2 times, most recently from 792e6b2 to 71393a7 Compare February 26, 2025 07:47
Copy link
Contributor

@JHolba JHolba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new tests are good :)

@@ -83,18 +83,16 @@ def main() -> None:

jobs_output: list[Job] = []
for job in args.jobs:
job_name: str = read(jobs_path / f"{job}.name") or "_"
job_name: str = read(jobs_path / job / "name") or "_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes related to "Add tests for the propagation" or does these changes serve some other purpose (was looking for a longer description in the commit message but could not find it).

(two years down the road we will be looking at the git blame for this line and dont get why this was changed based on reading the log message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just refactoring. In the cases we run multiple realizations with the mocked binaries, I think it makes sense that each job has its own directory instead of prefixing the files with the job_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the commit message to include this^

Copy link
Contributor

@berland berland Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If refactoring (and not needed for being able to add tests), it would make sense to have it in a separate commit.

@pytest.mark.usefixtures("copy_poly_case")
def test_queue_options_are_propagated_from_config_to_bsub(monkeypatch):
"""
This end to end test is here to verify that queue_options are correctly propagated all the way from ert config to the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this could be line formatted, I speculate it extends 88 characters)

job_dir = next(
mock_jobs_dir.iterdir()
) # There is only one realization in this test
complete_command_invocation = (job_dir / "complete_command_invocation").read_text(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as the capturing_bsub fixture? If it is, maybe consider using that, or is this way better?

)
)
run_cli(ENSEMBLE_EXPERIMENT_MODE, "--disable-monitoring", "poly.ert")
mock_jobs_dir = Path(f"{os.environ.get('PYTEST_TMP_PATH')}/mock_jobs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it smells a little bit to use this environment variable. Will capturing_bsub solve it or can't you find it from cwd?

job_dir = next(
mock_jobs_dir.iterdir()
) # There is only one realization in this test
complete_command_invocation = (job_dir / "complete_command_invocation").read_text(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check capturing_qsub

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capturing_qsub and capturing_bsub does not actually run what is submitted, so it wouldn't be a end-to-end test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But scheduler/bin/bsub.py is just a mock, so this is not an end-to-end test in any case.

If the real bsub command from LSF gets a breaking change requiring a change in our code, this code will not catch it, so I am unsure if it adds value to this test to use scheduler/bin/bsub.py over capturing_bsub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I am only interested in the part from ErtConfig to bsub/qsub/sbatch, so I can probably rewrite the test to be an integration test for ertconfig to driver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some unit tests for this which I could implement for all queue_options like this

@pytest.mark.usefixtures("capturing_bsub")
async def test_submit_with_project_code():
    queue_config_dict = {
        "QUEUE_SYSTEM": "LSF",
        "FORWARD_MODEL": [("FLOW",), ("ECLIPSE",), ("RMS",)],
    }
    queue_config = QueueConfig.from_dict(queue_config_dict)
    driver: LsfDriver = create_driver(queue_config.queue_options)
    await driver.submit(0, "sleep")
    assert f"-P {queue_config.queue_options.project_code}" in Path(
        "captured_bsub_args"
    ).read_text(encoding="utf-8")

But this part might not be accurate to how we do it in ensemble.py, and any breaking changes there would not be picked up by the tests.

queue_config = QueueConfig.from_dict(queue_config_dict)
driver: LsfDriver = create_driver(queue_config.queue_options)

assert parsed_options.get("-q") == expected_queue
assert parsed_options.get("-A") == expected_project_code

# -l was not parsed correctly by getopt, so we read it manually instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to parse the command line or just rely on certain strings being present in the command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Very good!

This commit refactors the mocked binaries to have each job use its own sub-directory instead of prefixing the files with job_id.
This commit adds tests that verify that the queue_options are all propagated correctly to the slurm, openpbs, and lsf queue system, using
our mocked binaries.
@@ -223,7 +239,7 @@ async def test_submit_sets_stderr():


@pytest.mark.usefixtures("capturing_bsub")
async def test_submit_with_realization_memory_with_bsub_capture():
async def test_submit_with_realization_memory_with_bsub_capture(): # JONAK - CAN THIS BE REMOVED?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a unit test, but we are testing the same thing in an integration test. Should we keep both as the integration_tests are not run with just rapid-tests?

@jonathan-eq jonathan-eq requested a review from berland February 27, 2025 12:37
@jonathan-eq jonathan-eq force-pushed the fix-tests branch 3 times, most recently from e0d02ea to cd141a3 Compare February 27, 2025 13:32
@xjules
Copy link
Contributor

xjules commented Feb 27, 2025

General note: it looks to be very complicated. I thought something similar to this test?

def test_tracking(

Which essentially create dedicated models and check that the scheduler got the parameters correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:misc Automatically categorise as miscellaneous change in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add E2E tests for queue options
4 participants