feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution #444

jhonasiv · 2024-12-06T13:34:36Z

Depends on:

feat: capture errors during network generation instead of raising them rock-gazebo/drivers-transformer#13

We introduced a ResolutionError to mark errors during the new plan
network resolution shouldnt be raised, which causes the whole transation
to fail. Instead, we capture them and fail the deployment of the
specific tasks that caused them.

wvmcastro

One thing that is not clear to me, when trying to deploy the below composition there is an error with second deployment and on_error: commit. The deployer will still deploy first and second child?

class Parent < Compositions
  add Model1, as: "first"
  add Model2, as "second"
  add Model3, as "third"
  ...
end

lib/syskit/cli/doc/gen.rb

lib/syskit/network_generation/system_network_deployer.rb

lib/syskit/network_generation/system_network_generator.rb

jhonasiv · 2025-02-03T19:05:42Z

One thing that is not clear to me, when trying to deploy the below composition there is an error with second deployment and on_error: commit. The deployer will still deploy first and second child?
class Parent < Compositions
  add Model1, as: "first"
  add Model2, as "second"
  add Model3, as "third"
  ...
end

If a child of a parent fails the deployment validation, itself and all of its parents also fail, therefore the other children will also fail. PS: on_error: commit is only used by profile_assertions tests, so in normal circumstances the deployer will do what I just said, but when on_error: commit the whole transaction would be commited, which would lead to all present errors being propagated at the end of the network generation process.

This adapts the exceptions that previously refered with multiple tasks to the error capture system, that requires each exception to refer to a single task. The error messages were adapted for a single task context, but unfortunately they can still a bit too verbose.

feature flag This creates the structures that should be used to capture errors. One can retain the previous behavior by using the RaiseErrorHandler, or use the error capture via the ResolutionErrorHandler. The feature flag is meant to be used to switch between the two options.

… them Through the resolution error handler, capture the errors during the network generation. Then, they are propagated as failed events for each individual task that caused the error. The previous behavior is enabled when the capture_errors_during_network_resolution feature flag is off. That way, any exception is immediately raised. Some behaviors were slightly changed to work for both the old and new features, specially for unit tests.

test/test/test_network_manipulation.rb

errors Whenever resolution errors are reported, it will emit failed for each individual task AND raise PartialNetworkResolution since the profile assertions require an exception to fail

The error capture introduces a slight change in the engine interface, so we need to adapt these so they keep working as before.

Now that we can capture errors, we have to add more information on the result of the application of the generated network to the plan. This is done by changing the return of #syskit_apply_resolution_results to a struct that has information of the errors and the planned instances.

doudou · 2025-02-28T18:27:57Z

Rakefile

 core early_deploy: true
+core capture_errors_during_network_resolution: true


I think we should run with both activated as well ... no ?

doudou · 2025-02-28T18:31:00Z

lib/syskit/network_generation/engine.rb

-                validate_deployed_network: (true if Syskit.conf.early_deploy?),
-                early_deploy: Syskit.conf.early_deploy?
+                validate_deployed_network: Syskit.conf.early_deploy?,
+                early_deploy: false,


Why the change from Syskit.conf.early_deploy?, especially given that you kept it for validate_deployed_network

Would be better to make early_deploy the default for validate_deployed_network

early_deploy: Syskit.conf.early_deploy?, validate_deployed_network: early_deploy

doudou · 2025-02-28T18:34:00Z

lib/syskit/network_generation/engine.rb

+
+                if on_error == :save
+                    errors.each do |e|
+                        handle_resolution_exception(e, on_error: on_error)


It really does not make sense to call this more than once.

Also, the (very little) difference in logic between handle_resolution_errors and handle_resolution_exception looks suspicious.

doudou · 2025-02-28T18:38:41Z

lib/syskit/network_generation/engine.rb

                )

+                unless resolution_errors.empty? || cleanup_resolution_errors


I find that unless(complex logic) is very hard to read ... I'd keep it for simple logic (only one boolean)

Here, if !resolution_errors.empty? && !cleanup_resolution_errors

doudou · 2025-02-28T18:39:31Z

lib/syskit/network_generation/engine.rb

+                elsif on_error == :commit
+                    work_plan.commit_transaction
+                else
+                    errors.each { |err| real_plan.execution_engine.add_error(err) }


I'm not very happy that you modify real_plan from here

Conclusion: should be removed.

doudou · 2025-02-28T19:02:46Z

lib/syskit/test/network_manipulation.rb

+                        resolution_errors.each do |error|
+                            t = error.planned_task
+                            expect_execution do
+                                t.failed_event.emit(error.original_exceptions)
+                            end
+                        end


Would be better to invert, that is run the loop inside the expect_execution

And replace expect_execution by execute. expect_execution without a to block actually does nothing

doudou · 2025-02-28T19:04:51Z

lib/syskit/test/network_manipulation.rb

@@ -227,31 +238,32 @@ def syskit_deploy_normalized_placeholder_tasks(
                    .to { emit(*not_running.map(&:start_event)) }

                resolve_options = Hash[on_error: :commit].merge(resolve_options)
-                begin
+                resolution_errors = execute do


Why the need for execute here ?

doudou · 2025-02-28T19:07:27Z

test/network_generation/test_system_network_generator.rb

+                            # formatted = formatted.gsub(/<id:\d+>/, "<id:X>")
+                            #                      .gsub(/Class:0x[0-9a-f]+/,
+                            #                            "Class:0xXXXXXX")


doudou · 2025-02-28T19:07:34Z

test/network_generation/test_system_network_generator.rb

+                        # assert_equal expected,
+                        #              formatted.gsub(/<id:\d+>/, "<id:X>")
+                        #                       .gsub(/Class:0x[0-9a-f]+>:0x[0-9a-f]+>/,
+                        #                             "Class:0xXXXXXX>:0xXXXXXX>")


doudou · 2025-02-28T19:08:22Z

lib/syskit/runtime/apply_requirement_modifications.rb

                    t.success_event.emit
                end
-                nil
+                resolution_apply_result.errors.group_by(&:planning_task).each do |task, e|
+                    task.failed_event.emit e.flat_map(&:original_exception)


task is not guaranteed to be running here. Might be from a previous deploy.

doudou · 2025-02-28T20:00:41Z

lib/syskit/runtime/apply_requirement_modifications.rb

                ensure
                    syskit_current_resolution_keepalive.discard_transaction
                    @syskit_current_resolution = nil
                end

-                running_requirement_tasks.each do |t|
+                resolution_apply_result.instances.each_key do |t|


Instead of finishing InstanceRequirementsTask when the resolution finishes, my proposition would be to

have an intermediate event resolution_success, which gets emitted here instead of success

change the logic that finds what needs to be deployed to only pick the running InstanceRequirementsTask

change the logic that finds whether there are new requirements to check if there are running InstanceRequirementsTask without the resolution_success event emitted (resolution_success_event.emitted?)

Logic in runtime/apply_deployment_tasks_from_plan.rb

doudou · 2025-02-28T20:20:56Z

lib/syskit/runtime/apply_requirement_modifications.rb

            rescue ::Exception => e # rubocop:disable Lint/RescueException
                running_requirement_tasks.each do |t|
                    t.failed_event.emit(e)
                end
-                e
+                NetworkGeneration::SystemNetworkPlanApplyResult.new(
+                    fulfilled: false, errors: [e]


You should be returning a SystemNetworkPlanApplyResult that has the same instances as before the resolution, and a proper list of errors attached to the new ones.

doudou · 2025-02-28T20:23:17Z

lib/syskit/test/network_manipulation.rb

                        tasks_to_instanciate.map(&:planning_task),
                        validate_generated_network: false,
                        early_deploy: false
                    )
+                    unless resolution_errors.empty?
+                        resolution_errors.each do |error|
+                            t = error.planned_task


planning_task ?

doudou · 2025-02-28T20:24:38Z

lib/syskit/network_generation/engine.rb

+                elsif on_error == :commit
+                    work_plan.commit_transaction
+                else
+                    errors.each { |err| real_plan.execution_engine.add_error(err) }


Conclusion: should be removed.

doudou · 2025-02-28T20:25:50Z

test/features/capture_errors.rb

+Syskit.conf.early_deploy = true
+Syskit.conf.capture_errors_during_network_resolution = true


Decouple the two flags, and run 4 test suites (plain core, with early deploy, with capture errors and with both)

jhonasiv requested review from doudou and wvmcastro December 6, 2024 13:34

jhonasiv self-assigned this Dec 6, 2024

jhonasiv force-pushed the capture-errors branch 3 times, most recently from d0b83eb to 63070d9 Compare December 6, 2024 17:17

jhonasiv force-pushed the capture-errors branch 2 times, most recently from 72722c1 to d070e92 Compare December 23, 2024 21:12

jhonasiv force-pushed the capture-errors branch from 0121c5a to a55ca32 Compare January 29, 2025 19:14

jhonasiv changed the title ~~[WIP] feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution~~ feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution Jan 29, 2025

jhonasiv force-pushed the capture-errors branch from a55ca32 to b66c087 Compare January 31, 2025 14:52

wvmcastro reviewed Feb 3, 2025

View reviewed changes

jhonasiv force-pushed the capture-errors branch 2 times, most recently from 571f5dd to 0eafc30 Compare February 4, 2025 13:41

jhonasiv mentioned this pull request Feb 4, 2025

feat: capture errors during network generation instead of raising them rock-gazebo/drivers-transformer#13

Open

1 task

jhonasiv added 2 commits February 24, 2025 15:58

jhonasiv force-pushed the capture-errors branch from 0eafc30 to 25b6040 Compare February 24, 2025 19:36

jhonasiv force-pushed the capture-errors branch from 25b6040 to 0db91ad Compare February 24, 2025 19:43

jhonasiv commented Feb 24, 2025

View reviewed changes

test/test/test_network_manipulation.rb Show resolved Hide resolved

jhonasiv force-pushed the capture-errors branch from 0db91ad to bc251de Compare February 24, 2025 20:05

jhonasiv added 2 commits February 25, 2025 10:39

fix: adapt network manipulation to deal with possible captured

109eee6

errors Whenever resolution errors are reported, it will emit failed for each individual task AND raise PartialNetworkResolution since the profile assertions require an exception to fail

fix: adapt doc and gui code after error capture

4eeefa1

The error capture introduces a slight change in the engine interface, so we need to adapt these so they keep working as before.

jhonasiv force-pushed the capture-errors branch 2 times, most recently from 729b796 to aa31a18 Compare February 25, 2025 14:08

jhonasiv force-pushed the capture-errors branch from aa31a18 to 3b564b1 Compare February 26, 2025 13:54

doudou reviewed Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution #444

feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution #444

jhonasiv commented Dec 6, 2024 •

edited

Loading

wvmcastro left a comment

jhonasiv commented Feb 3, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

doudou Feb 28, 2025

		core early_deploy: true
		core capture_errors_during_network_resolution: true

		)

		unless resolution_errors.empty? \|\| cleanup_resolution_errors

		Syskit.conf.early_deploy = true
		Syskit.conf.capture_errors_during_network_resolution = true

feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution #444

Are you sure you want to change the base?

feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution #444

Conversation

jhonasiv commented Dec 6, 2024 • edited Loading

wvmcastro left a comment

Choose a reason for hiding this comment

jhonasiv commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhonasiv commented Dec 6, 2024 •

edited

Loading