Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow '_ensure_save_result' to be skipped with a flag in any place it is called. #513

Closed
GeraldIr opened this issue Dec 4, 2023 · 10 comments

Comments

@GeraldIr
Copy link
Member

GeraldIr commented Dec 4, 2023

Often I find myself being unable to start jobs that have different result nodes which are incompatible with save_result and there has to be some workaround implemented for handling these edge cases.

I'd like to propose an optional flag for methods such as create_job and download which allows ensure save result to be skipped so that no superfluous save_result node gets added.

I can also create a PR for this if a change like this is approved.

@jdries
Copy link
Collaborator

jdries commented Dec 4, 2023 via email

@GeraldIr
Copy link
Member Author

GeraldIr commented Dec 4, 2023

We are currently using save_ml_model which stores results of random forest regression, we debated including this functionality in save_result but decided the outputs were different enough to justify its own process (and this might in the future be the case for more things, like UDFs).

I think for more advanced users having the option would be a benefit, and making it optional would save the users relying on it currently from any trouble.

@soxofaan
Copy link
Member

soxofaan commented Dec 7, 2023

is save_ml_model the only use case you're talking about? Because we can easily whitelist that process alongside save_result in the logic.

@soxofaan
Copy link
Member

soxofaan commented Dec 7, 2023

ah no wait, save_ml_model is already whitelisted in MlModel.create_job() so I'm not sure I fully understand the problem here.

can you provide some examples in code snippets?

@ValentinaHutter
Copy link

I would like to skip the save_result for some of the sen2like processing and now created a PR to do so: #627 Not sure, if I miss something, I could not find a whitelist for save_ml_model to use it as an example.

@jdries
Copy link
Collaborator

jdries commented Sep 23, 2024

Side note: if you do connection.create_job(your_cube.flat_graph()), no save_result will be added. This is the easy way around.
A better way forward would be to fetch the schema for the last process, and only add save_resultif the output is a datacube.
Now the thing is, sen2like claims the output is a datacube, so as a user, I would expect that I actually need to add a save_resultprocess, otherwise I won't get valid output. Also the process description does not mention anything in this regards.
@ValentinaHutter Is that process metadata correct?

@ValentinaHutter
Copy link

Thanks for pointing this out, then I will use the create job with the flat graph for now!
The reason for this is that sen2like processor natively generates a Sentinel2 like .SAFE folder holding all the results. In order to make use of other openeo processes on top of that, we then load the data from the .SAFE afterwards. So, when we only want to generate the .SAFE output, we would not need a save_result process afterwards. This is why the return value of sen2like is a datacube. This is described in the process metadata.

@soxofaan
Copy link
Member

In order to make use of other openeo processes on top of that, we then load the data from the .SAFE afterwards. So, when we only want to generate the .SAFE output, we would not need a save_result process afterwards.

I'm not completely following: if you don't have a save_result in your first job, you don't really have result assets in your job results so how can you reference to that cube in another job?

Isn't it better to define a "SAFE" file format in your capabilities to be used in this case?

@soxofaan
Copy link
Member

I could not find a whitelist for save_ml_model to use it as an example.

FYI: I was referring to this part in create_job of the MlModel class:

if pg.result_node().process_id not in {"save_ml_model"}:
_log.warning("Process graph has no final `save_ml_model`. Adding it automatically.")
pg = pg.save_ml_model()

@soxofaan
Copy link
Member

added auto_add_save_result option to download/create_job/execute_batch with cfd9f1a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants