-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use nproc
for clarity and add a unit test to verify that the message causing the crash is dropped
#181
base: master
Are you sure you want to change the base?
Conversation
…causing the crash is dropped
Interesting. Thanks will look into this today. Thanks!! |
I see what you mean here. But is dropping the message that caused the crash the correct behaviour? Maybe its an internal state that is corrupt that has nothing to do with the message that has just been processed and caused the crash. Maybe a restart and restoring the actor back from its state in the DB, could not crash anymore on the message that crashed it. What are your thoughts? |
It is important to understand that the algorithm has not changed at all. I have merely reordered the variables to explain why the increment of nproc at the end of the loop and its usage in recovery caused some tests to freeze. The added test will successfully pass on the code from the master branch. The current implementation (master branch) follows a minimalist paradigm, which assumes that if a message is read but cannot be processed, we simply move on to the next one. |
// Processed 'nproc' messages successfully, then encountered a crash on the next message. | ||
// After restart, processing begins with the message following the one that caused the crash. | ||
// 'nrecv' represents the total number of successfully received messages, including the one that caused the crash. | ||
nrecv := nproc + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line of code is problematic. If the actor receives a message that will cause a crash, we should not skip that message.
Example: Actor receives a new message -> store data in DB.
If database connection fails, panic -> recover() -> retry
We do not want to skip that message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation from the master branch already does this. Skips a message that failed to process.
Use
nproc
for clarity and add a unit test to verify that the message causing the crash is dropped