Use `nproc` for clarity and add a unit test to verify that the message causing the crash is dropped #181

mapogolions · 2025-01-25T12:08:30Z

Use nproc for clarity and add a unit test to verify that the message causing the crash is dropped

…causing the crash is dropped

anthdm · 2025-01-25T17:08:50Z

Interesting. Thanks will look into this today. Thanks!!

anthdm · 2025-01-29T15:36:13Z

I see what you mean here. But is dropping the message that caused the crash the correct behaviour? Maybe its an internal state that is corrupt that has nothing to do with the message that has just been processed and caused the crash. Maybe a restart and restoring the actor back from its state in the DB, could not crash anymore on the message that crashed it.

What are your thoughts?

@tprifti

mapogolions · 2025-01-29T19:17:54Z

It is important to understand that the algorithm has not changed at all. I have merely reordered the variables to explain why the increment of nproc at the end of the loop and its usage in recovery caused some tests to freeze. The added test will successfully pass on the code from the master branch. The current implementation (master branch) follows a minimalist paradigm, which assumes that if a message is read but cannot be processed, we simply move on to the next one.

tprifti · 2025-01-29T22:49:22Z

actor/process.go

+			// Processed 'nproc' messages successfully, then encountered a crash on the next message.
+			// After restart, processing begins with the message following the one that caused the crash.
+			// 'nrecv' represents the total number of successfully received messages, including the one that caused the crash.
+			nrecv := nproc + 1


I think this line of code is problematic. If the actor receives a message that will cause a crash, we should not skip that message.
Example: Actor receives a new message -> store data in DB.
If database connection fails, panic -> recover() -> retry

We do not want to skip that message.

The current implementation from the master branch already does this. Skips a message that failed to process.

Use nproc for clarity and add a unit test to verify that the message …

391bed4

…causing the crash is dropped

anthdm requested a review from perbu January 29, 2025 18:56

tprifti suggested changes Jan 29, 2025

View reviewed changes

koola mentioned this pull request Feb 18, 2025

Retry message failures #189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `nproc` for clarity and add a unit test to verify that the message causing the crash is dropped #181

Use `nproc` for clarity and add a unit test to verify that the message causing the crash is dropped #181

mapogolions commented Jan 25, 2025 •

edited

Loading

anthdm commented Jan 25, 2025

anthdm commented Jan 29, 2025

mapogolions commented Jan 29, 2025

tprifti Jan 29, 2025

mapogolions Jan 29, 2025

Use nproc for clarity and add a unit test to verify that the message causing the crash is dropped #181

Are you sure you want to change the base?

Use nproc for clarity and add a unit test to verify that the message causing the crash is dropped #181

Conversation

mapogolions commented Jan 25, 2025 • edited Loading

anthdm commented Jan 25, 2025

anthdm commented Jan 29, 2025

mapogolions commented Jan 29, 2025

tprifti Jan 29, 2025

Choose a reason for hiding this comment

mapogolions Jan 29, 2025

Choose a reason for hiding this comment

Use `nproc` for clarity and add a unit test to verify that the message causing the crash is dropped #181

Use `nproc` for clarity and add a unit test to verify that the message causing the crash is dropped #181

mapogolions commented Jan 25, 2025 •

edited

Loading