Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIU-448: Improve PyFunc robustness to erroneous graph input. #308

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

myxie
Copy link
Collaborator

@myxie myxie commented Feb 5, 2025

JIRA Ticket

LIU-448

Type

  • [ ] Feature (addition)
  • Bug fix
  • Refactor (change)
  • [ ] Documentation

Problem/Issue

The introduction of per-port encoding (#300) in has caused the side-effect of 'aggressive' correctness in the engine - previously working graphs, that were not explicitly correct but which the old engine code was able to parse by brute-force, are no longer working. This leads to bad exceptions because the code is expecting data that does not exist, and has not initialised all elements in PyFuncApp.

This PR addresses the outstanding issues and ensures that if the graph fails, it fails in a controlled manner.

Solution

  • Initialise the expected argument values for the function in the order of:
    • Function default (if given; None otherwise)
    • ApplicationArgs defaults (if given; None otherwise)
  • Stores all argument (positional or keywords) in the arg:{value, encoding dictionary structure to remove chance of attempting to access "encoding" when it doesn't exist
  • Restructured _initAppArgs and named_port_utils.
  • Added regression tests that go from graph-to-engine.
  • Also updated simple_functions.py string2json to improve it's robustness to bad input.

Checklist

  • Unittests added
  • [ ] Documentation added
    • Bug fixes.

Summary by Sourcery

Update PyFunc application to handle invalid graph input and improve robustness by merging function default values with graph-provided values.

Bug Fixes:

  • Fixed an issue where PyFunc was not robust to erroneous graph input.

Enhancements:

  • Improved the robustness of the string2json function in simple_functions.py to handle bad input.

Tests:

  • Added a test case for the string2json function.

- Also updated simple_functions.py string2json to improve _it's_ robustness to bad input.
Copy link
Contributor

sourcery-ai bot commented Feb 5, 2025

Reviewer's Guide by Sourcery

This pull request improves the robustness of the PyFunc application and its auxiliary functions by addressing issues with handling erroneous graph input and default values. The changes include modifications to how function defaults are managed and merged, new helper methods for positional argument initialization, enhanced error handling when evaluating and mapping arguments, and refactoring of tests to ensure edge cases (especially around the string2json function) are properly handled. Minor formatting and logging improvements were also applied in several modules.

Updated Class Diagram for PyFunc Enhancements

classDiagram
    class PyFunc {
      - fn_defaults: dict
      - func_defaults: dict
      - _applicationArgs: dict
      - argsig
      - func
      - parameters: dict
      - outputs
      + _mixin_func_defaults()
      + _init_fn_defaults()
      + _clean_applicationArgs()
      + _initialise_positional_args(argsDict: dict) : dict
      + _get_arg_info(arg) : tuple
      + _init_appArgs(pargsDict: dict, keyargsDict: dict, posargs: list) : list
      + _ports2args(posargs, pargsDict, keyargsDict) : dict
      + initialize_with_func_code()
      + initialize(**kwargs)
      + run()
      + _match_parser(output_drop)
      + write_results(result)
      + generate_recompute_data()
    }
    note for PyFunc "Updated defaults merging using update() instead of direct assignment to improve robustness"
    note for PyFunc "Added helper methods _initialise_positional_args and _get_arg_info for better argument mapping"
Loading

File-Level Changes

Change Details Files
Enhanced PyFunc default values handling and argument processing.
  • Updated _mixin_func_defaults to merge default dictionaries using update() instead of overwriting.
  • Initialized and populated the fn_defaults dictionary in _init_fn_defaults to properly set up default values.
  • Added helper methods (_initialise_positional_args and _get_arg_info) for improved processing of positional and keyword arguments.
  • Modified _init_appArgs and run methods to correctly update and map application arguments using default values.
  • Cleaned up formatting and logging statements for clarity.
daliuge-engine/dlg/apps/pyfunc.py
Improved error handling and robustness in JSON and drop specifications.
  • Enhanced the string2json function to handle empty string input by defaulting to an empty JSON string.
  • Adjusted logging and exception management in functions that evaluate and map input arguments.
  • Refactored conditionals to ensure proper use of encoding and default parser behavior in output parsing.
daliuge-engine/dlg/apps/simple_functions.py
Refactored test suite and added new test cases for edge scenarios.
  • Introduced a helper function (create_and_run_graph_spec_from_lgt) to reduce duplicated code when initializing and executing graph specs in tests.
  • Added new tests for the string2json functionality to cover scenarios with missing, empty, and incorrect parameters.
  • Refactored existing tests in the ports encoding test file to use the new helper and improve clarity.
daliuge-engine/test/test_named_port_apps.py
Improved error handling in port identification.
  • Wrapped the DropParser encoding retrieval in a try-except block in identify_named_ports to catch ValueError and warn about missing encoding.
daliuge-engine/dlg/named_port_utils.py
Fixed graph loader behavior to properly handle drop specifications.
  • Updated conditionals in graph_loader to correctly pop the last element when it has an rmode flag or is empty, ensuring proper graph construction.
daliuge-engine/dlg/graph_loader.py
Adjusted session initialization for graph deployment.
  • Modified the deployment function to wrap graph values in a list, ensuring correct behavior when passing the graph to the loader.
daliuge-engine/dlg/manager/session.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@myxie myxie changed the title LIU-448: Improve PyFunc robustness to erroneous graph input. (WIP) LIU-448: Improve PyFunc robustness to erroneous graph input. Feb 5, 2025
@myxie myxie changed the title (WIP) LIU-448: Improve PyFunc robustness to erroneous graph input. LIU-448: Improve PyFunc robustness to erroneous graph input. Feb 10, 2025
@myxie myxie marked this pull request as ready for review February 10, 2025 03:21
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @myxie - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding more specific error handling or logging within the _initialise_positional_args function to provide better insights into potential issues.
  • The changes to _init_fn_defaults could be simplified by directly assigning self.fn_defaults = {k: p.default if p.default != inspect._empty else None for k, p in self.argsig.parameters.items()}.
Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +361 to +363
if isinstance(self._applicationArgs[kw]["value"],
bool or self._applicationArgs[kw]["value"] or
self._applicationArgs[kw]["precious"], ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The isinstance call uses a non-standard expression for its type parameter.

Using 'bool or self._applicationArgs[kw]["value"] or self._applicationArgs[kw]["precious"]' as the type argument in isinstance is non-idiomatic and likely not doing what’s intended. The second argument for isinstance should be a type (or a tuple of types), not a dynamic expression. Please review the logic and apply a clearer check for whether the value is a boolean or fulfills the desired condition.

Comment on lines +651 to +652
for arg in pargsDict:
pargsDict[arg] = self.fn_defaults[arg]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Overwriting application arguments with function defaults.

The current implementation unconditionally replaces every value in pargsDict (and similarly in keyargsDict) with the corresponding value from fn_defaults. This means any explicitly provided value may be lost. Please ensure this behavior is intended; if it is meant to fill in missing values, consider merging rather than overriding the existing ones.

@@ -144,7 +144,11 @@ def identify_named_ports(
if value is None:
value = "" # make sure we are passing NULL drop events
if key in positionalArgs:
encoding = DropParser(positionalPortArgs[key]['encoding'])
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Logger warning message formatting in the exception block.

The warning message uses 'No encoding set for %key: possible default'. It appears the placeholder '%key' is a literal string rather than a proper formatting key. Consider using structured formatting with '%s' or f-string formatting to clearly indicate which key is affected.

Comment on lines 360 to 365
if kw in self._applicationArgs: # these are the preferred ones now
if isinstance(
self._applicationArgs[kw]["value"],
bool
or self._applicationArgs[kw]["value"]
or self._applicationArgs[kw]["precious"],
):
if isinstance(self._applicationArgs[kw]["value"],
bool or self._applicationArgs[kw]["value"] or
self._applicationArgs[kw]["precious"], ):
# only transfer if there is a value or precious is True
self._applicationArgs.pop(kw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Merge nested if conditions (merge-nested-ifs)

Suggested change
if kw in self._applicationArgs: # these are the preferred ones now
if isinstance(
self._applicationArgs[kw]["value"],
bool
or self._applicationArgs[kw]["value"]
or self._applicationArgs[kw]["precious"],
):
if isinstance(self._applicationArgs[kw]["value"],
bool or self._applicationArgs[kw]["value"] or
self._applicationArgs[kw]["precious"], ):
# only transfer if there is a value or precious is True
self._applicationArgs.pop(kw)
if kw in self._applicationArgs and isinstance(self._applicationArgs[kw]["value"],
bool or self._applicationArgs[kw]["value"] or
self._applicationArgs[kw]["precious"], ):
self._applicationArgs.pop(kw)


ExplanationToo much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if conditions can be combined using
and is an easy win.

Comment on lines +376 to +377
param_arg = self.parameters.get(arg, None)
if param_arg:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change
param_arg = self.parameters.get(arg, None)
if param_arg:
if param_arg := self.parameters.get(arg, None):

encoding = self._applicationArgs[arg].get("encoding", "dill")

return value, encoding

def _init_appArgs(self, pargsDict: dict, keyargsDict: dict, posargs: list) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:


Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@coveralls
Copy link

coveralls commented Feb 10, 2025

Coverage Status

coverage: 78.717% (+0.1%) from 78.596%
when pulling 38c46a0 on LIU-448
into 927717d on master.

@myxie
Copy link
Collaborator Author

myxie commented Feb 10, 2025

@awicenec may you please review this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as part of this PR we should also drop the input_parser and output_parser parameters from the doxygen.

Copy link
Contributor

@awicenec awicenec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but I would certainly also drop the input_parser and output_parser parameters from the doxygen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants