-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIU-448: Improve PyFunc robustness to erroneous graph input. #308
base: master
Are you sure you want to change the base?
Conversation
- Also updated simple_functions.py string2json to improve _it's_ robustness to bad input.
Reviewer's Guide by SourceryThis pull request improves the robustness of the PyFunc application and its auxiliary functions by addressing issues with handling erroneous graph input and default values. The changes include modifications to how function defaults are managed and merged, new helper methods for positional argument initialization, enhanced error handling when evaluating and mapping arguments, and refactoring of tests to ensure edge cases (especially around the string2json function) are properly handled. Minor formatting and logging improvements were also applied in several modules. Updated Class Diagram for PyFunc EnhancementsclassDiagram
class PyFunc {
- fn_defaults: dict
- func_defaults: dict
- _applicationArgs: dict
- argsig
- func
- parameters: dict
- outputs
+ _mixin_func_defaults()
+ _init_fn_defaults()
+ _clean_applicationArgs()
+ _initialise_positional_args(argsDict: dict) : dict
+ _get_arg_info(arg) : tuple
+ _init_appArgs(pargsDict: dict, keyargsDict: dict, posargs: list) : list
+ _ports2args(posargs, pargsDict, keyargsDict) : dict
+ initialize_with_func_code()
+ initialize(**kwargs)
+ run()
+ _match_parser(output_drop)
+ write_results(result)
+ generate_recompute_data()
}
note for PyFunc "Updated defaults merging using update() instead of direct assignment to improve robustness"
note for PyFunc "Added helper methods _initialise_positional_args and _get_arg_info for better argument mapping"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @myxie - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding more specific error handling or logging within the
_initialise_positional_args
function to provide better insights into potential issues. - The changes to
_init_fn_defaults
could be simplified by directly assigningself.fn_defaults = {k: p.default if p.default != inspect._empty else None for k, p in self.argsig.parameters.items()}
.
Here's what I looked at during the review
- 🟡 General issues: 3 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if isinstance(self._applicationArgs[kw]["value"], | ||
bool or self._applicationArgs[kw]["value"] or | ||
self._applicationArgs[kw]["precious"], ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: The isinstance call uses a non-standard expression for its type parameter.
Using 'bool or self._applicationArgs[kw]["value"] or self._applicationArgs[kw]["precious"]' as the type argument in isinstance is non-idiomatic and likely not doing what’s intended. The second argument for isinstance should be a type (or a tuple of types), not a dynamic expression. Please review the logic and apply a clearer check for whether the value is a boolean or fulfills the desired condition.
for arg in pargsDict: | ||
pargsDict[arg] = self.fn_defaults[arg] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): Overwriting application arguments with function defaults.
The current implementation unconditionally replaces every value in pargsDict (and similarly in keyargsDict) with the corresponding value from fn_defaults. This means any explicitly provided value may be lost. Please ensure this behavior is intended; if it is meant to fill in missing values, consider merging rather than overriding the existing ones.
@@ -144,7 +144,11 @@ def identify_named_ports( | |||
if value is None: | |||
value = "" # make sure we are passing NULL drop events | |||
if key in positionalArgs: | |||
encoding = DropParser(positionalPortArgs[key]['encoding']) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Logger warning message formatting in the exception block.
The warning message uses 'No encoding set for %key: possible default'. It appears the placeholder '%key' is a literal string rather than a proper formatting key. Consider using structured formatting with '%s' or f-string formatting to clearly indicate which key is affected.
if kw in self._applicationArgs: # these are the preferred ones now | ||
if isinstance( | ||
self._applicationArgs[kw]["value"], | ||
bool | ||
or self._applicationArgs[kw]["value"] | ||
or self._applicationArgs[kw]["precious"], | ||
): | ||
if isinstance(self._applicationArgs[kw]["value"], | ||
bool or self._applicationArgs[kw]["value"] or | ||
self._applicationArgs[kw]["precious"], ): | ||
# only transfer if there is a value or precious is True | ||
self._applicationArgs.pop(kw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Merge nested if conditions (merge-nested-ifs
)
if kw in self._applicationArgs: # these are the preferred ones now | |
if isinstance( | |
self._applicationArgs[kw]["value"], | |
bool | |
or self._applicationArgs[kw]["value"] | |
or self._applicationArgs[kw]["precious"], | |
): | |
if isinstance(self._applicationArgs[kw]["value"], | |
bool or self._applicationArgs[kw]["value"] or | |
self._applicationArgs[kw]["precious"], ): | |
# only transfer if there is a value or precious is True | |
self._applicationArgs.pop(kw) | |
if kw in self._applicationArgs and isinstance(self._applicationArgs[kw]["value"], | |
bool or self._applicationArgs[kw]["value"] or | |
self._applicationArgs[kw]["precious"], ): | |
self._applicationArgs.pop(kw) | |
Explanation
Too much nesting can make code difficult to understand, and this is especiallytrue in Python, where there are no brackets to help out with the delineation of
different nesting levels.
Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if
conditions can be combined using
and
is an easy win.
param_arg = self.parameters.get(arg, None) | ||
if param_arg: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression
)
param_arg = self.parameters.get(arg, None) | |
if param_arg: | |
if param_arg := self.parameters.get(arg, None): |
encoding = self._applicationArgs[arg].get("encoding", "dill") | ||
|
||
return value, encoding | ||
|
||
def _init_appArgs(self, pargsDict: dict, keyargsDict: dict, posargs: list) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Replace f-string with no interpolated values with string (
remove-redundant-fstring
) - Merge dictionary updates via the union operator [×3] (
dict-assign-update-to-union
) - Low code quality found in PyFuncApp._init_appArgs - 18% (
low-code-quality
)
Explanation
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@awicenec may you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as part of this PR we should also drop the input_parser and output_parser parameters from the doxygen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, but I would certainly also drop the input_parser and output_parser parameters from the doxygen.
JIRA Ticket
LIU-448
Type
[ ] Feature (addition)[ ] DocumentationProblem/Issue
The introduction of per-port encoding (#300) in has caused the side-effect of 'aggressive' correctness in the engine - previously working graphs, that were not explicitly correct but which the old engine code was able to parse by brute-force, are no longer working. This leads to bad exceptions because the code is expecting data that does not exist, and has not initialised all elements in PyFuncApp.
This PR addresses the outstanding issues and ensures that if the graph fails, it fails in a controlled manner.
Solution
arg:{value, encoding
dictionary structure to remove chance of attempting to access "encoding" when it doesn't exist_initAppArgs
andnamed_port_utils
.Checklist
[ ] Documentation addedSummary by Sourcery
Update PyFunc application to handle invalid graph input and improve robustness by merging function default values with graph-provided values.
Bug Fixes:
Enhancements:
string2json
function insimple_functions.py
to handle bad input.Tests:
string2json
function.