LIU-448: Improve PyFunc robustness to erroneous graph input. #308

myxie · 2025-02-05T10:46:08Z

JIRA Ticket

Type

~~[ ] Feature (addition)~~
Bug fix
Refactor (change)
~~[ ] Documentation~~

Problem/Issue

The introduction of per-port encoding (#300) in has caused the side-effect of 'aggressive' correctness in the engine - previously working graphs, that were not explicitly correct but which the old engine code was able to parse by brute-force, are no longer working. This leads to bad exceptions because the code is expecting data that does not exist, and has not initialised all elements in PyFuncApp.

This PR addresses the outstanding issues and ensures that if the graph fails, it fails in a controlled manner.

Solution

Initialise the expected argument values for the function in the order of:
- Function default (if given; None otherwise)
- ApplicationArgs defaults (if given; None otherwise)
Stores all argument (positional or keywords) in the arg:{value, encoding dictionary structure to remove chance of attempting to access "encoding" when it doesn't exist
Restructured _initAppArgs and named_port_utils.
Added regression tests that go from graph-to-engine.
Also updated simple_functions.py string2json to improve it's robustness to bad input.

Checklist

Unittests added
~~[ ] Documentation added~~
- Bug fixes.

Summary by Sourcery

Update PyFunc application to handle invalid graph input and improve robustness by merging function default values with graph-provided values.

Bug Fixes:

Fixed an issue where PyFunc was not robust to erroneous graph input.

Enhancements:

Improved the robustness of the string2json function in simple_functions.py to handle bad input.

Tests:

Added a test case for the string2json function.

- Also updated simple_functions.py string2json to improve _it's_ robustness to bad input.

sourcery-ai · 2025-02-05T10:46:12Z

Reviewer's Guide by Sourcery

This pull request improves the robustness of the PyFunc application and its auxiliary functions by addressing issues with handling erroneous graph input and default values. The changes include modifications to how function defaults are managed and merged, new helper methods for positional argument initialization, enhanced error handling when evaluating and mapping arguments, and refactoring of tests to ensure edge cases (especially around the string2json function) are properly handled. Minor formatting and logging improvements were also applied in several modules.

Updated Class Diagram for PyFunc Enhancements

classDiagram
    class PyFunc {
      - fn_defaults: dict
      - func_defaults: dict
      - _applicationArgs: dict
      - argsig
      - func
      - parameters: dict
      - outputs
      + _mixin_func_defaults()
      + _init_fn_defaults()
      + _clean_applicationArgs()
      + _initialise_positional_args(argsDict: dict) : dict
      + _get_arg_info(arg) : tuple
      + _init_appArgs(pargsDict: dict, keyargsDict: dict, posargs: list) : list
      + _ports2args(posargs, pargsDict, keyargsDict) : dict
      + initialize_with_func_code()
      + initialize(**kwargs)
      + run()
      + _match_parser(output_drop)
      + write_results(result)
      + generate_recompute_data()
    }
    note for PyFunc "Updated defaults merging using update() instead of direct assignment to improve robustness"
    note for PyFunc "Added helper methods _initialise_positional_args and _get_arg_info for better argument mapping"

File-Level Changes

Change	Details	Files
Enhanced PyFunc default values handling and argument processing.	Updated _mixin_func_defaults to merge default dictionaries using update() instead of overwriting. Initialized and populated the fn_defaults dictionary in _init_fn_defaults to properly set up default values. Added helper methods (_initialise_positional_args and _get_arg_info) for improved processing of positional and keyword arguments. Modified _init_appArgs and run methods to correctly update and map application arguments using default values. Cleaned up formatting and logging statements for clarity.	`daliuge-engine/dlg/apps/pyfunc.py`
Improved error handling and robustness in JSON and drop specifications.	Enhanced the string2json function to handle empty string input by defaulting to an empty JSON string. Adjusted logging and exception management in functions that evaluate and map input arguments. Refactored conditionals to ensure proper use of encoding and default parser behavior in output parsing.	`daliuge-engine/dlg/apps/simple_functions.py`
Refactored test suite and added new test cases for edge scenarios.	Introduced a helper function (create_and_run_graph_spec_from_lgt) to reduce duplicated code when initializing and executing graph specs in tests. Added new tests for the string2json functionality to cover scenarios with missing, empty, and incorrect parameters. Refactored existing tests in the ports encoding test file to use the new helper and improve clarity.	`daliuge-engine/test/test_named_port_apps.py`
Improved error handling in port identification.	Wrapped the DropParser encoding retrieval in a try-except block in identify_named_ports to catch ValueError and warn about missing encoding.	`daliuge-engine/dlg/named_port_utils.py`
Fixed graph loader behavior to properly handle drop specifications.	Updated conditionals in graph_loader to correctly pop the last element when it has an rmode flag or is empty, ensuring proper graph construction.	`daliuge-engine/dlg/graph_loader.py`
Adjusted session initialization for graph deployment.	Modified the deployment function to wrap graph values in a list, ensuring correct behavior when passing the graph to the loader.	`daliuge-engine/dlg/manager/session.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @myxie - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider adding more specific error handling or logging within the _initialise_positional_args function to provide better insights into potential issues.
The changes to _init_fn_defaults could be simplified by directly assigning self.fn_defaults = {k: p.default if p.default != inspect._empty else None for k, p in self.argsig.parameters.items()}.

Here's what I looked at during the review

🟡 General issues: 3 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-02-10T03:22:19Z

daliuge-engine/dlg/apps/pyfunc.py

+                if isinstance(self._applicationArgs[kw]["value"],
+                        bool or self._applicationArgs[kw]["value"] or
+                        self._applicationArgs[kw]["precious"], ):


issue: The isinstance call uses a non-standard expression for its type parameter.

Using 'bool or self._applicationArgs[kw]["value"] or self._applicationArgs[kw]["precious"]' as the type argument in isinstance is non-idiomatic and likely not doing what’s intended. The second argument for isinstance should be a type (or a tuple of types), not a dynamic expression. Please review the logic and apply a clearer check for whether the value is a boolean or fulfills the desired condition.

sourcery-ai · 2025-02-10T03:22:19Z

daliuge-engine/dlg/apps/pyfunc.py

+        for arg in pargsDict:
+            pargsDict[arg] = self.fn_defaults[arg]


question (bug_risk): Overwriting application arguments with function defaults.

The current implementation unconditionally replaces every value in pargsDict (and similarly in keyargsDict) with the corresponding value from fn_defaults. This means any explicitly provided value may be lost. Please ensure this behavior is intended; if it is meant to fill in missing values, consider merging rather than overriding the existing ones.

sourcery-ai · 2025-02-10T03:22:20Z

daliuge-engine/dlg/named_port_utils.py

@@ -144,7 +144,11 @@ def identify_named_ports(
        if value is None:
            value = ""  # make sure we are passing NULL drop events
        if key in positionalArgs:
-            encoding = DropParser(positionalPortArgs[key]['encoding'])
+            try:


nitpick: Logger warning message formatting in the exception block.

The warning message uses 'No encoding set for %key: possible default'. It appears the placeholder '%key' is a literal string rather than a proper formatting key. Consider using structured formatting with '%s' or f-string formatting to clearly indicate which key is affected.

sourcery-ai · 2025-02-10T03:22:20Z

daliuge-engine/dlg/apps/pyfunc.py

            if kw in self._applicationArgs:  # these are the preferred ones now
-                if isinstance(
-                    self._applicationArgs[kw]["value"],
-                    bool
-                    or self._applicationArgs[kw]["value"]
-                    or self._applicationArgs[kw]["precious"],
-                ):
+                if isinstance(self._applicationArgs[kw]["value"],
+                        bool or self._applicationArgs[kw]["value"] or
+                        self._applicationArgs[kw]["precious"], ):
                    # only transfer if there is a value or precious is True
                    self._applicationArgs.pop(kw)


suggestion (code-quality): Merge nested if conditions (merge-nested-ifs)

Suggested change

if kw in self._applicationArgs: # these are the preferred ones now

if isinstance(

self._applicationArgs[kw]["value"],

bool

or self._applicationArgs[kw]["value"]

or self._applicationArgs[kw]["precious"],

):

if isinstance(self._applicationArgs[kw]["value"],

bool or self._applicationArgs[kw]["value"] or

self._applicationArgs[kw]["precious"], ):

# only transfer if there is a value or precious is True

self._applicationArgs.pop(kw)

if kw in self._applicationArgs and isinstance(self._applicationArgs[kw]["value"],

bool or self._applicationArgs[kw]["value"] or

self._applicationArgs[kw]["precious"], ):

self._applicationArgs.pop(kw)

Explanation
Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if conditions can be combined using
and is an easy win.

daliuge-engine/dlg/apps/pyfunc.py

sourcery-ai · 2025-02-10T03:22:20Z

daliuge-engine/dlg/apps/pyfunc.py

+            param_arg = self.parameters.get(arg, None)
+            if param_arg:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

param_arg = self.parameters.get(arg, None)

if param_arg:

if param_arg := self.parameters.get(arg, None):

sourcery-ai · 2025-02-10T03:22:20Z

daliuge-engine/dlg/apps/pyfunc.py

+        encoding = self._applicationArgs[arg].get("encoding", "dill")
+
+        return value, encoding
+
    def _init_appArgs(self, pargsDict: dict, keyargsDict: dict, posargs: list) -> list:


issue (code-quality): We've found these issues:

Replace f-string with no interpolated values with string (remove-redundant-fstring)

Merge dictionary updates via the union operator [×3] (dict-assign-update-to-union)

Low code quality found in PyFuncApp._init_appArgs - 18% (low-code-quality)

Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.

Reduce nesting, perhaps by introducing guard clauses to return early.

Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

coveralls · 2025-02-10T03:48:34Z

coverage: 78.717% (+0.1%) from 78.596%
when pulling 38c46a0 on LIU-448
into 927717d on master.

myxie · 2025-02-10T05:26:12Z

@awicenec may you please review this PR?

awicenec · 2025-03-19T07:04:21Z

daliuge-engine/dlg/apps/pyfunc.py

I think as part of this PR we should also drop the input_parser and output_parser parameters from the doxygen.

awicenec

The changes look good, but I would certainly also drop the input_parser and output_parser parameters from the doxygen.

LIU-448: Improve PyFunc robustness to erroneous graph input.

f469ca9

- Also updated simple_functions.py string2json to improve _it's_ robustness to bad input.

myxie changed the title ~~LIU-448: Improve PyFunc robustness to erroneous graph input.~~ (WIP) LIU-448: Improve PyFunc robustness to erroneous graph input. Feb 5, 2025

LIU-448: Resolve unittest errors and introduce test_string2json test.

892f96f

myxie changed the title ~~(WIP) LIU-448: Improve PyFunc robustness to erroneous graph input.~~ LIU-448: Improve PyFunc robustness to erroneous graph input. Feb 10, 2025

myxie marked this pull request as ready for review February 10, 2025 03:21

sourcery-ai bot reviewed Feb 10, 2025

View reviewed changes

LIU-448: Comit or-if-exp identify sourcery suggestion

37897fb

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

myxie assigned awicenec Feb 10, 2025

Merge branch 'master' of https://github.com/icrar/daliuge into LIU-448

38c46a0

awicenec reviewed Mar 19, 2025

View reviewed changes

awicenec approved these changes Mar 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LIU-448: Improve PyFunc robustness to erroneous graph input. #308

LIU-448: Improve PyFunc robustness to erroneous graph input. #308

myxie commented Feb 5, 2025 •

edited by jira bot

Loading

sourcery-ai bot commented Feb 5, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Feb 10, 2025

sourcery-ai bot Feb 10, 2025

sourcery-ai bot Feb 10, 2025

sourcery-ai bot Feb 10, 2025

sourcery-ai bot Feb 10, 2025

sourcery-ai bot Feb 10, 2025

coveralls commented Feb 10, 2025 •

edited

Loading

myxie commented Feb 10, 2025

awicenec Mar 19, 2025

awicenec left a comment

	param_arg = self.parameters.get(arg, None)
	if param_arg:
	if param_arg := self.parameters.get(arg, None):

LIU-448: Improve PyFunc robustness to erroneous graph input. #308

Are you sure you want to change the base?

LIU-448: Improve PyFunc robustness to erroneous graph input. #308

Conversation

myxie commented Feb 5, 2025 • edited by jira bot Loading

JIRA Ticket

Type

Problem/Issue

Solution

Checklist

Summary by Sourcery

sourcery-ai bot commented Feb 5, 2025 • edited Loading

Reviewer's Guide by Sourcery

Updated Class Diagram for PyFunc Enhancements

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Feb 10, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 10, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 10, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 10, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 10, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 10, 2025

Choose a reason for hiding this comment

coveralls commented Feb 10, 2025 • edited Loading

myxie commented Feb 10, 2025

awicenec Mar 19, 2025

Choose a reason for hiding this comment

awicenec left a comment

Choose a reason for hiding this comment

myxie commented Feb 5, 2025 •

edited by jira bot

Loading

sourcery-ai bot commented Feb 5, 2025 •

edited

Loading

coveralls commented Feb 10, 2025 •

edited

Loading