Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP/Draft] Add upload_artifact_and_register_model method to python client #761

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Crazyglue
Copy link
Contributor

Description

Adds a convenience method to perform a common operation within the python client, to both upload and register a model that exists locally on the filesystem.

How Has This Been Tested?

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

If you have UI changes

  • The developer has added tests or explained why testing cannot be added.
  • Included any necessary screenshots or gifs if it was a UI change.
  • Verify that UI/UX changes conform the UX guidelines for Kubeflow.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tarilabs for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Crazyglue Crazyglue force-pushed the feat/python-upload-and-register branch 2 times, most recently from fb011b2 to a603953 Compare February 4, 2025 18:49
…eholder functions

Signed-off-by: Eric Dobroveanu <edobrove@redhat.com>
@Crazyglue Crazyglue force-pushed the feat/python-upload-and-register branch from a603953 to 8090a03 Compare February 4, 2025 19:03
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Feb 5, 2025
Signed-off-by: Eric Dobroveanu <edobrove@redhat.com>
@Crazyglue Crazyglue force-pushed the feat/python-upload-and-register branch from 409964d to 6ebe337 Compare February 5, 2025 15:27
Copy link
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for sharing this as a draft @Crazyglue !

I think this will be helpful to ease the work of the MLOps eng, DS persona, who typically need to operate these capabilities otherwise manually.

Any general thought of a plug-in mechanism if people want to override an implementation?
ie we are likely to offer a default with boto3 as an extra for the _upload_to_s3(), what mechanism do we want to suggest to other Devs who may want to implement their own tooling for S3 ? (for future extensibility, doens't have to be in this PR)

Comment on lines +208 to +211
if is_s3_uri(destination_uri):
self._upload_to_s3(artifact_local_path, destination_uri, upload_client_params['region_name'])
elif is_oci_uri(destination_uri):
self._upload_to_oci(artifact_local_path, destination_uri)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur the flow is decided based on destination; just thinking we could leverage here functions to check the required configuration parameters (ie Env variables) are present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the individual upload methods would be checking for those, as they would instantiate their respective clients (boto3 vs skopeo).

I haven't done it yet in this PR, but the upload_client_params mapping will actually need to contain either the boto3 params OR the skopeo params, and that mapping is passed to both upload methods. Then those methods can determine whether it has enough information to instantiate a client and connect/perform the upload

@Crazyglue
Copy link
Contributor Author

Any general thought of a plug-in mechanism if people want to override an implementation? ie we are likely to offer a default with boto3 as an extra for the _upload_to_s3(), what mechanism do we want to suggest to other Devs who may want to implement their own tooling for S3 ? (for future extensibility, doens't have to be in this PR)

To be honest I'm not familiar enough with Python to know the "pythonic" way of doing this, haha. but perhaps we can have a kwarg to provide a boto3 or skopeo client as an optional param to this method. if provided, use it, else instantiate a client. after client creation/resolution we can perform the upload.

it might be worth considering another option, lower level to the above, which is to create a separate StorageClient class which provides an interface ontop of the boto3/skopeo client to perform a .upload()

@tarilabs any thoughts on what the right level of abstraction might be in this case? if the two storage options are just s3 and oci, then im not sure the added complexity is worth it


def _upload_to_oci(
self,
artifact_local_path: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider here, is how to pass/customize the "base image" of the modelcar being built. Do we want that to be an explicit parameter, or a default from a config which could be overridden if needed ?

@tarilabs
Copy link
Member

tarilabs commented Feb 5, 2025

@tarilabs any thoughts on what the right level of abstraction might be in this case? if the two storage options are just s3 and oci, then im not sure the added complexity is worth it

considering this is to support an internal implementation which likely will need further iterations/refinement, I wouldn't "prematurely overengineer it" :)
Until is confided within a module, we can always refactor later as we discover more details on use-cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants