Retry message failures #189

koola · 2025-02-18T09:00:04Z

This PR is a continuation of #181 . It introduces two new options WithRetries and WithMaxRetries and aims to retry message failures on actor panics.

The retry logic is as follows;

pid := e.SpawnFunc(func(c *Context) {}, "foo", WithRetries(2), WithMaxRetries(2), WithMaxRestarts(1))

Each message that panics is retried up to two times per message WithRetries(2).
On the second retry, an ActorUnprocessableMessageEvent is broadcast to the eventstream containing the message and the message within the buffer is dropped.
Following two consecutive message retries (4 retries), max retries are reached `WithMaxRetries(2).
The actor is restarted to restore actor state and the messages left in buffer are resumed except those that were unprocessable.
Retries are reset, but assume another two consecutive retries and restart. Max restarts are reached and the actor terminates and cleans up.

Using without retry;

pid := e.SpawnFunc(func(c *Context) {}, "foo", WithRetries(0), WithMaxRestarts(1))

Each message that panics is dropped and actor is restarted as per original logic.

Included in this PR is a test that covers the full retry logic as well as amended tests.

All other tests pass.

mapogolions · 2025-02-23T15:40:25Z

@koola
Thank you for revisiting this topic—it's quite interesting. In my view, a minimalist approach, like Erlang’s, is best. Once a message is retrieved from the mailbox, it’s up to the user to handle its processing and any errors. If the handler panics, it should fail, and the system should continue with the next message. Users should be free to decide their fault-tolerance strategies, as the causes of panics can vary. A one-size-fits-all approach seems unnecessary.

In the most basic implementation, it could look like this.

e.SpawnFunc(func(*Context) {
	defer func() {
		// recover logic here
	}()
})

Of course, this is just one perspective

koola · 2025-02-25T02:13:06Z

Once a message is retrieved from the mailbox, it’s up to the user to handle its processing and any errors. If the handler panics, it should fail, and the system should continue with the next message. Users should be free to decide their fault-tolerance strategies, as the causes of panics can vary. A one-size-fits-all approach seems unnecessary.

This is still the case with this pr as any messages that are unprocessable are up to the user to handle. What this pr adds is a more fault tolerant solution to the default which is to silently drop messages and move on, the very definition of a one size fits all approach. One I wasn't happy with tbh.

koola · 2025-02-25T02:33:14Z

Also after thinking, your implementation adds more abstraction and further complexity. The actor already has an event stream that broadcasts events, which imo should include unprocessable messages so that there is only one place to subscribe and source of truth per actor. I think this opens up more options than the basic implementation given.

add message retries

8e7d0ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry message failures #189

Retry message failures #189

koola commented Feb 18, 2025

mapogolions commented Feb 23, 2025 •

edited

Loading

koola commented Feb 25, 2025

koola commented Feb 25, 2025

Retry message failures #189

Are you sure you want to change the base?

Retry message failures #189

Conversation

koola commented Feb 18, 2025

mapogolions commented Feb 23, 2025 • edited Loading

koola commented Feb 25, 2025

koola commented Feb 25, 2025

mapogolions commented Feb 23, 2025 •

edited

Loading