-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache polish work - including the introduction of cache policies to the new top level Cache object #3129
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
…ignore_input_vars` Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Code Review Agent Run #e01a83Actionable Suggestions - 6
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3129 +/- ##
==========================================
- Coverage 77.90% 76.10% -1.80%
==========================================
Files 217 205 -12
Lines 22178 21758 -420
Branches 2768 2820 +52
==========================================
- Hits 17277 16559 -718
- Misses 4115 4377 +262
- Partials 786 822 +36 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Code Review Agent Run #6b0cb9Actionable Suggestions - 0Additional Suggestions - 10
Review Details
|
@@ -343,7 +346,35 @@ def launch_dynamically(): | |||
:param pickle_untyped: Boolean that indicates if the task allows unspecified data types. | |||
""" | |||
|
|||
def wrapper(fn: Callable[P, Any]) -> PythonFunctionTask[T]: | |||
def wrapper(fn: Callable[P, FuncOut]) -> PythonFunctionTask[T]: | |||
nonlocal cache, cache_serialize, cache_version, cache_ignore_input_vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid nonlocal
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you expand a bit on why?
We need to modify these variables inside the wrapper
function and in python variable assignment implies the creation of a new variable unless such variable is declared as nonlocal.
cache_ignore_input_vars = cache.get_ignored_inputs() | ||
cache = True | ||
|
||
if cache_serialize is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird to have cache_version
being set above, but then immediately overwritten. One needs to look a few lines above to see that these parameters can not be set.
Can we at least have a check here to make it obvious that this is only used for is for the bool case?
is_bool_cache = isinstance(cache, bool) and cache is True and cache_version is None
if is_bool_cache:
cache = Cache(...)
if isinstance(cache, Cache):
....
if is_bool_cache:
if cache_serialize is None:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TaskMetadata
dataclass does not take Optional
for the cache-related fields, which means that we either (1) modify those types in the dataclass declaration or (2) set default values prior to invoking the constructor. I decided to go with (2) because having those fields as Optional means that I'd have to check for the nil case everywhere (which is pretty annoying).
Also worth noting that this initialization covers the common case where the cache is not used at all (so not only the positive bool case).
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Code Review Agent Run #27f658Actionable Suggestions - 0Review Details
|
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Code Review Agent Run #1f41e5Actionable Suggestions - 1
Review Details
|
Tracking issue
flyteorg/flyte#6143
Why are the changes needed?
Over the years we received some feedback complaining about the discoverability of the cache-related functionality, this is finally being addressed in this PR. This PR also opens the door for a more complex caching mechanism to be applied to each
@task
.What changes were proposed in this pull request?
We create a new object called
Cache
which is groups all of the caching functionality, including the ability to set the version, cache serialization, the ability to ignore certain inputs in the cache key calculation (via settingcache_ignore_input_vars
), and the set of cache policies applied. Here's the full definition ofCache
:The first three fields are well known and map directly to existing cache-related concepts, now grouped in a single object. As for
policies
andsalt
, those are new concepts, the idea is to provide more complicated caching behavior based on the function definition. ACachePolicy
is a protocol that specifies how versions are calculated. It exposes a single method calledget_version
that's called to produce a value, that we callversion
.The
Cache
object applies the policies in the order in which they are defined to generate the final cache version. The process of generating the final value involves concatenating each version produced by a policy and then using SHA-256 to produce the final hexdigest (i.e. 64 hex characters) value.Also worth mentioning that we require the definition of policies in case a
Cache
object is used.All cache-related parameters are now officially deprecated (although they will keep being supported, potentially for a really long time). In order to make the migration seamless, the
cache
parameter in the@task
decorator is nowUnion[bool, Cache]
and mapping the old names to the new fields inCache
should be fairly simple.As a final design decision, we also added a hook to help define a list of default policies. This might suit the more advanced users in case they need to maintain a central list of cache policies somewhere.
How was this patch tested?
Unit tests and local sandbox.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This PR implements a comprehensive refactoring of Flyte tasks' caching functionality, transitioning from explicit cache parameters to a kwargs-based approach with a new Cache class. The changes include enhanced pod template support, improved cache policy validation, and authentication system refinements. The implementation features updated error messaging for cache policy requirements and modified task decorator for pod template parameter handling, while maintaining backward compatibility through policy-based version generation.Unit tests added: True
Estimated effort to review (1-5, lower is better): 5