Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Implement OpenM++ MPI job controller using GO #24

Open
2 of 8 tasks
Souheil-Yazji opened this issue Nov 28, 2023 · 4 comments
Open
2 of 8 tasks

[Epic] Implement OpenM++ MPI job controller using GO #24

Souheil-Yazji opened this issue Nov 28, 2023 · 4 comments
Assignees
Labels
kind/epic Project epic that contains tasks kind/feature feature request

Comments

@Souheil-Yazji
Copy link
Contributor

Souheil-Yazji commented Nov 28, 2023

Continuation of #19
Relates to #13

Requirements

  • routing all POST MPIJob requests to this controller
  • controller successfully creates MPIjob manifests and submits them to k8s
  • a response code is propagated up to the requester

Implementation

<Any nuances encountered during implementation to be explained, don't forget in-line comments>

Testing

<Test cases ran, should cover all edge cases>

Deployment

Other Notes

@chuckbelisle
Copy link
Contributor

chuckbelisle commented Dec 12, 2023

Was not worked on during this sprint, moving to the next one.

@jacek-dudek
Copy link
Collaborator

I did background reading on golang foundations including core datatypes, control flow, concurrency model, organization of source files, importing packages, building projects.

I looked into the conversion process from json files into go datatypes and the reverse.

The parameter passing model was confusing so I needed to spend additional time on it. Parameters are passed by value. Passing by reference is done explicitly with pointers. BUT there are some core datatypes (slices, maps) that have implicit underlying data structures that store map and slice data. When a slice or map variable is passed as an argument, the "wrapper" part gets copied by value, but the underlying data is NOT copied. So maps and slices are essentially passed by reference.

I will need to review the material on concurrency and data sharing between goroutines via channels and the patterns and anti-patterns that are suggested there.

After that I started looking into the go-client library. I learned how to authenticate a go application from inside the cluster. There is a package named "k8s.io/client-go/rest" that creates the configuration based on the default authentication tokens that are copied into all deployed containers.

After creating the config object we can use it to query the kubernetes client set for all the resources in the cluster. Then we query the client set to obtain specific resource collections.

What I'm trying to figure out now is how the collection of rest endpoints served by the kubernetes api is mapped to go-client code. There is a core package named "k8s.io/client-go/kubernetes". This one exports a function that returns the client set for the cluster.

There are also packages like: "k8s.io/api/apps/v1" and "k8s.io/api/core/v1" that refer to specific resource groups and export data structures that correspond to the structure of the specification manifests for these resources. These ones we need to import to create resource specifications programmatically.

But some resource types don't offer a corresponding golang package (that would come with these resource specific data structure definitions). In that case we need to use a dynamic go-client package called "k8s.io/client-go/dynamic" that would let us generically define and interact with specific resource types like an mpijob.

So one thing I need to do is locate the mpijob specific golang package if it is provided by the kubeflow training and mpi operator extension. If it doesn't exist I need to look into the dynamic resource client.

@jacek-dudek
Copy link
Collaborator

Linking to branches with work in progress:
StatCan/aaw-kubeflow-containers#565
https://github.com/StatCan/openmpp/tree/openmpp-24

@chuckbelisle chuckbelisle added the kind/epic Project epic that contains tasks label Jan 10, 2024
@chuckbelisle chuckbelisle changed the title Implement OpenM++ MPI job controller using GO [Epic] Implement OpenM++ MPI job controller using GO Jan 10, 2024
@chuckbelisle
Copy link
Contributor

Updated this issue to be an Epic and edited description in order to define tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/epic Project epic that contains tasks kind/feature feature request
Projects
None yet
Development

No branches or pull requests

4 participants