persisted memoization (ie saving @mo.cache values) #3471

gabrielgrant · 2025-01-16T22:13:33Z

Description

I'd like to have the control offered by the standard memoization interface of the mo.cache decorator, but have the values persist across kernel restarts (as with marimo.persistent_cache)

There are a few third-party options that do something similar (at least for pure functions):

How does marimo's built-in caching compare to these options?

AFAICT the main difference would be that mo.cache takes closed-over dependent variables and/or source code changes into account in the same way mo.persistent_cache does, right? Are there any differences between the caching rules of the two marimo interfaces? Or just whether values are persisted? What is the thinking in having the style of cache interface (function memoization vs block-based-with-context-manager) also dictate whether the cache is persisted to disk or not?

Alternative

Recommend using one of the existing third-party libs

Additional context

No response

The text was updated successfully, but these errors were encountered:

gabrielgrant · 2025-01-17T01:08:54Z

One (potential? hopefully?) difference is that these other solutions seem to have issues with consistently/correctly caching Pandas DataFrames:

joblib/joblib#1611
grantjenks/python-diskcache#314

This is obviously a pretty common use-case in a notebook environment. Is this handled correctly in marimo's cache implementation(s)?

dmadisetti · 2025-01-17T16:33:39Z

I've been promising a cache update and here's the branch:

#3480

I think for the pd case, this is something we can explicitly test against, and maybe squeeze into this branch. Some of the hashing mechanism is inspired by job lib. I think marimo should catch this, since it refers directly to the
I think I'd like to group some of these changes together to prevent repeated cache misses on version updates

Related to this conversation is this discussion:

#2653

Where we decided @persistent_cache should be a drop in for @cache

dmadisetti · 2025-01-21T17:12:22Z

Just looked at this code again, and the answer is better than joblib, but still not ideal.

marimo has 2 modes of hashing, ContentAddressed (which is the content of the variable) and ExecutionContext (Which is the code run to produce the variable, used for cases where we cannot get a clear hash).

IF a dataframe is all numerical data then the raw, contiguous memory behind the df is used for hashing. If it is not (df contains objects), then the fallback is to ExecutionContext.

More explicit handling could be done, and I don't think column names contribute to the hash in this case.

@akshayka

## 📝 Summary fixes #2653 #3471 ## 🔍 Description of Changes Enables `persistent_cache` to be used as a decorator for functions, and `cache` to be used as a context block. e.g. ```python @mo.persistent_cache def expensive_function_written_to_disk(): ... # or with mo.cache("expensive_block_in_memory") as c: ... ``` `cache` is also used as the general entry point for custom "Loaders" --- The breadth of the API makes the implementation a bit hairy, but I think that if it's a smooth experience for the user then it's worth it. @akshayka

gabrielgrant added the enhancement New feature or request label Jan 16, 2025

dmadisetti mentioned this issue Jan 23, 2025

feature: allow persistent_cache to be used as a decorator #3550

Merged

dmadisetti linked a pull request Jan 23, 2025 that will close this issue

feature: allow persistent_cache to be used as a decorator #3550

Merged

akshayka closed this as completed in #3550 Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persisted memoization (ie saving @mo.cache values) #3471

persisted memoization (ie saving @mo.cache values) #3471

gabrielgrant commented Jan 16, 2025 •

edited

Loading

gabrielgrant commented Jan 17, 2025 •

edited

Loading

dmadisetti commented Jan 17, 2025

dmadisetti commented Jan 21, 2025

persisted memoization (ie saving @mo.cache values) #3471

persisted memoization (ie saving @mo.cache values) #3471

Comments

gabrielgrant commented Jan 16, 2025 • edited Loading

Description

Suggested solution

Alternative

Additional context

gabrielgrant commented Jan 17, 2025 • edited Loading

dmadisetti commented Jan 17, 2025

dmadisetti commented Jan 21, 2025

gabrielgrant commented Jan 16, 2025 •

edited

Loading

gabrielgrant commented Jan 17, 2025 •

edited

Loading