-
Notifications
You must be signed in to change notification settings - Fork 2
Error handling
In Guflow errors are reported by two mechanism:
- Error events
- Exceptions
-
Error events: Error events are recorded in the history events of a workflow. An error event could either be associated with a scheduled item (timer, activity, lambda function or child workflow) or workflow itself. e.g. If an activity has send the failed response to Amazon SWF, then it will the recorded as an "activity failed event" in the workflow history. You can return a custom workflow action on "activity failed event" as shown below:
[WorkflowDescription("1.0")] public class TranscodeWorkflow : Workflow { public TranscodeWorkflow() { //In following example DownloadActivity will be scheduled on failure. ScheduleActivity<DownloadActivity>().OnFailure(e=>Reschdule(e)); } }
Similarly error can also be reported by workflow specific events. A workflow event is not associated with any scheduled item and you will use WorkflowEventAttribute to handle them.
[WorkflowDescription("1.0")] public class TranscodeWorkflow : Workflow { public TranscodeWorkflow() { .... } //It will ignore signal failed event. [WorkflowEvent(EventName.SignalFailed)] public WorkflowAction FailedToSendSignal() => Ignore; }
Note: Default action of any fail/timeout event is to fail the workflow immediately. You need to handle these events if you want to take custom action.
-
Exceptions: You will use .NET try-catch approach to handle API specific errors however for hosting related exceptions you will follow the approach as described in the following section.
Guflow supports Retry, Unhandled and Continue (RUC) model to handle hosting related exceptions. To better understand RUC model let us understand how the hosted workflows are executed. WorkflowHost, when execution is started, goes is in a loop and executes the steps as shown in following example using pseudo code:
while(NotStopped) { // Step:1 var decisionTask = Poll for decision task // Step: 2 var decisions = Execute new events against a workflow //Step: 3 Send decisions to Amazon SWF }
Exceptions can occur at either of above three steps and you get the opportunity to handle them on all of the above steps. WorkflowHost provides the APIs, as shown in following example, to handle the exceptions for all of the above steps:
using(var host = domain.Host(new []{new OrderWorkflow())) { //Using following API you will register the error handler for polling error host.OnPollingError(HandlePollingError); //Use following API you can register the error handler for execution error. It also acts a fallback handler, which //means any unhandled polling or response error is also forwarded to this handler. host.OnError(HandleGenericError); //Using following API you can handle the error encountered when sending the response to Amazon SWF. host.OnResponseError(HandleResponseError); host.StartExecution(); Console.WriteLine("Press any key to terminate"); Console.ReadKey(); } private ErrorAction HandlePollingError(Error e) { //You can return either of following status depending on error return ErrorAction.Continue; //ErrorAction.Retry // ErrorAction.Unhandled; } private ErrorAction HandleGenericError(Error e) { //You can return either of following status depending on error return ErrorAction.Continue; //ErrorAction.Retry // ErrorAction.Unhandled; } private ErrorAction HandleResponseError(Error e) { //You can return either of following status depending on error return ErrorAction.Continue; //ErrorAction.Retry // ErrorAction.Unhandled; }
Now let us look at how different ErrorActions impact the execution:
ErrorAction.Continue: It will pretty much act like a continue statement in a loop. It will skip the execution of remaining part of the WorkflowHost's execution loop and start the execution from the starting of loop.
ErrorAction.Retry: It will re-execute the same step.
ErrorAction.Unhandled: Except for step -2, it will forward the error to generic error handler, registered using WorkflowHost.OnError API. If exception remains unhandled by even by generic error handler then WorkflowHost is faulted and it no more polls for new history events on Amazon SWF.
Some of the reasons that can cause exceptions at step 1/step 3 are:
- Network interruption,
- Workflow is abruptly terminated or timed out.
Exceptions in step 2 are raised when executing the workflow. Ideally you should handle any workflow related exception in your workflow class, if needed. However you have a choice to handle even workflow exception using RUC model.
Similarly activity host supports Retry, Unhandled and Continue (RUC) model. Similar to workflows, ActivityHost goes in execution loop as shown in following example using pseudo code:
while(NotStopped) { //Step 1 Poll for new activity task var activityTask = Poll for new activity task //Step 2 execute activity var activityResponse= execute activity for activityTask and start heartbeat if enabled. //Step 2 send the activity response to Amazon SWF Send activityResponse }
And just like workflows you can use ActivityHost APIs to register the error handler as shown in below example:
using(var host = domain.Host(activities)) { host.OnPollingError(HandlePollingError); host.OnError(HandleGenericError); host.OnResponseError(HandleResponseError); }
Generic error handler, registered using ActivityHost.OnError, get error notification for any heartbeat error and unhandled activity exception. However I would advise to handle heartbeat and activity exceptions in activity itself instead of handling it using generic error handler. Following example clarify it further:
[ActivityDescription("1.0")] public class ChargeCustomerActivity : Activity { public ChargeCustomerActivit() { //Register error handler for hearbeat Hearbeat.OnError(HandleHearbeatError); } [Execute] public async Task<ActivityResponse> Execute(ChargeInput input) { try { var token = service.ChargeCustomer(customInput); return Complete(token); } catch(PaymentException e) { Log.Result(e); return Fail("Fail to process payment", e.Reason); } } }
Default error action: By default both ActivityHost and WorkflowHost return ErrorAction.Continue in response to any error and its seems to play well with fault tolerant features of Amazon SWF. However you can easily change behaviour as shown below:
using(var host = domain.Host(new []{new TranscodeWorkflow()})) { host.OnError(e=>ErrorAction.Unhandled); ... }
Faulted Host: An unhandled exception cause the workflow/activity host to be faulted. A faulted host does not poll for new work any more. You can get the notification when a host becomes faulted as shown in the following example:
using(var host = domain.Host(new []{new TranscodeWorkflow()})) { host.OnError(e=>ErrorAction.Unhandled); host.OnFault+=(s,e)=> Environment.FailFast("What will I do now?", e.Exception); ... }
Guflow
- Prerequisite
- Installation
-
Workflows
- Creating first workflow
- Registration
- Hosting
- Start workflow
- Schedule activities
- Schedule timers
- Schedule lambda function
- Schedule child workflows
- Lambda functions vs activities
- Workflow input
- Workflow actions
- Signals
- Workflow branches
- Deflow algorithm
- Workflow events
- Query APIs
- Custom polling strategy
- Things to take care of
- Activites
- Unit testing
- Performance & scalability
- Error handling
- Logging
- Debugging
- Tutorial
- Release notes