-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persisted memoization (ie saving @mo.cache values) #3471
Comments
One (potential? hopefully?) difference is that these other solutions seem to have issues with consistently/correctly caching Pandas DataFrames: joblib/joblib#1611 This is obviously a pretty common use-case in a notebook environment. Is this handled correctly in marimo's cache implementation(s)? |
I've been promising a cache update and here's the branch: I think for the pd case, this is something we can explicitly test against, and maybe squeeze into this branch. Some of the hashing mechanism is inspired by job lib. I think marimo should catch this, since it refers directly to the Related to this conversation is this discussion: Where we decided |
Just looked at this code again, and the answer is better than joblib, but still not ideal. marimo has 2 modes of hashing, ContentAddressed (which is the content of the variable) and ExecutionContext (Which is the code run to produce the variable, used for cases where we cannot get a clear hash). IF a dataframe is all numerical data then the raw, contiguous memory behind the df is used for hashing. If it is not (df contains objects), then the fallback is to ExecutionContext. More explicit handling could be done, and I don't think column names contribute to the hash in this case. |
## 📝 Summary fixes #2653 #3471 ## 🔍 Description of Changes Enables `persistent_cache` to be used as a decorator for functions, and `cache` to be used as a context block. e.g. ```python @mo.persistent_cache def expensive_function_written_to_disk(): ... # or with mo.cache("expensive_block_in_memory") as c: ... ``` `cache` is also used as the general entry point for custom "Loaders" --- The breadth of the API makes the implementation a bit hairy, but I think that if it's a smooth experience for the user then it's worth it. @akshayka
Description
I'd like to have the control offered by the standard memoization interface of the
mo.cache
decorator, but have the values persist across kernel restarts (as withmarimo.persistent_cache
)There are a few third-party options that do something similar (at least for pure functions):
joblib.Memory
file_archive
How does marimo's built-in caching compare to these options?
AFAICT the main difference would be that
mo.cache
takes closed-over dependent variables and/or source code changes into account in the same waymo.persistent_cache
does, right? Are there any differences between the caching rules of the two marimo interfaces? Or just whether values are persisted? What is the thinking in having the style of cache interface (function memoization vs block-based-with-context-manager) also dictate whether the cache is persisted to disk or not?Suggested solution
Possibly just adding a "persisted" flag to
mo.cache
?Not sure all the details of how caching is implemented internally, but might make sense to re-use one of these external libs in order to not have to fully reinvent the wheel?
Alternative
Recommend using one of the existing third-party libs
Additional context
No response
The text was updated successfully, but these errors were encountered: