Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create resource templates for MPI and non-MPI jobs #36

Open
2 tasks
Souheil-Yazji opened this issue Dec 18, 2023 · 5 comments
Open
2 tasks

Create resource templates for MPI and non-MPI jobs #36

Souheil-Yazji opened this issue Dec 18, 2023 · 5 comments
Assignees

Comments

@Souheil-Yazji
Copy link
Contributor

Souheil-Yazji commented Dec 18, 2023

Anatoly demoed to us template resource config files which users can select in the oms UI. This can be used to allow different sizes for the worker pods.

  • Investigate OpenM wiki to determine how the templates are created
  • Implement and deploy the method to parse the templates and pass on to the Job Scheduler
@KrisWilliamson
Copy link
Contributor

KrisWilliamson commented Jan 5, 2024

In an email sent 19 December 2023, Anataoly said the following.


Please see attached all /etc/ directory files and other json files which oms can create for model run. What is included:

- mpi.ModelName.template.txt: I am using it to create mpirun command line;
- run-options.ModelName.anything.json: UI default setting for Model Run tab, I am not using it on the server side, only in UI where user can change all those default values;
- compute-start.sh and compute-stop.sh : scripts to start and stop back-end servers, there is an Azure version of start/stop scripts in our GitHub.

There is another set of json files, which you can use, maybe. Those files created  if oms is using "jobs" to manage model run queue(s). Those files are located in job/sub-directories. Also there is a  job/job.ini (attached) where back-end servers are described (server name, number of CPU cores and memory size).

If oms using "jobs" then from UI model run request is coming to there queue and json file created in job/queue directory. Example of queue file attached: 2023_12_19_14_07_51_546-#-localhost_4044-#-OncoSimX-lung-#-f3611d0b5add6342fcb0f258ac4b38da-#-mpi-#-cpu-#-12-#-mem-#-0-#-20220951.json. File name is very long because I am using name parts to schedule model run without reading file itself.

After model run job selected to run I am calculatiing which back-end servers to use and starting it using etc/compute-start.sh. Also creating ini file for mpirun (attached as host-2023_12_19_14_07_51_546-localhost_4044.ini). After that I am building mpirun command line from etc/mpi..... template, deleting queue json file and creating active/...json file, attached as 2023_12_19_14_07_51_546-#-localhost_4044-#-OncoSimX-lung-#-f3611d0b5add6342fcb0f258ac4b38da-#-mpi-#-cpu-#-12-#-mem-#-0-#-190568.json File name and content sligthly different from job/queue json file.

When model run completed I am moving that json file to job/history, attached as: 2023_12_19_14_07_51_546-#-localhost_4044-#-OncoSimX-lung-#-f3611d0b5add6342fcb0f258ac4b38da-#-2023_12_19_14_08_22_977-#-success.json

I don't think you want to use job/active or job/history file, but I certainly can make job/queue files available for you.

One more thing for you to consider, which you may already know: mpirun command line is not enough, I am also creating files with model run description and model run Markdown notes.

Regards,
Anataoly

etc.zip
OncoSimX-lung-#-f3611d0b5add6342fcb0f258ac4b38da-#-190568.json.txt
OncoSimX-lung-#-f3611d0b5add6342fcb0f258ac4b38da-#-mpi-#-cpu-#-12-#-mem-#-0-#-20220951.json.txt
OncoSimX-lung-#-f3611d0b5add6342fcb0f258ac4b38da-#-success.json.txt

host-2023_12_19_14_07_51_546-localhost_4044.ini.txt
job.ini.txt

@KrisWilliamson
Copy link
Contributor

related to Implement OpenM++ MPI job controller using GO in that the new go controller will need to use this information to launch jobs.

@chuckbelisle chuckbelisle changed the title Create resource templates for MPI jobs Create resource templates for MPI and non-MPI jobs Jan 10, 2024
@KrisWilliamson
Copy link
Contributor

According to the OpenMpp wiki, the ini files are used to simplify the length of command line arguments.

i.e. running the jobs directly, >modelOne.exe -ini small.ini.

  • Another use is in launch cloud jobs, the MPI option an also be saved in an ini file.
  • A job.ini file is read into the UI to save default setting for the server.
  • A Job queue can be set-up using an ini file to queue up and run jobs, where the ini file contains the server setting.

And so on.

I could find no references other than the job.ini file of the ini files being used in conjunction with the UI.

Given this is not how we intend to use OpenMpp in AAW, there seems to be no point in implementing these ini files directly.

  • Maybe a future issue/feature where ini files could be saved from the UI setting and a button enabled in the UI to re-use saved settings.

@Souheil-Yazji
Copy link
Contributor Author

The objective of this issue isn't to copy the work done by OpenM, it's to use a similar approach to provided choices (actually templates behind the scenes) to users of OpenM to be able to select the resource scale of their MPI Job workers. I.e, instead of using the Json file to select how many MPI threads are needed to or whether or not to use an ini file, it can be used to specify how many CPU/MEM to set for the workers.

@KrisWilliamson
Copy link
Contributor

Should this be part of Jacek's work on the go controller?
Related to 41?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants