-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node in singleton cluster never becomes leader #40
Comments
I'm pretty sure this is a liveness bug (and thus an issue outside the scope of election safety, which is guaranteed). What happens is that the singleton node never manages to elect itself leader - it waits forever for a The The original Go implementation of Raft uses a general loop for the Candidate state that first sends all necessary |
@palmskog thanks, that sounds like the problem. Seeing that the code isn't going to be used in production any time soon (ever?) I agree that it's probably not worth fixing unless it's easy. But I think it's worth pointing out the limitation in the README so that others don't run into it (running the most trivial case is a natural first thing to try). As a hack, I tried making the node send a vote request to itself instead of marking down its self-vote internally, but that messed with some proof:
and similarly it failed when I tried to get hackier and let the node record its self-vote immediately, but not record that it had voted (so that it would still message itself and trigger the receiver). Well, that too didn't work:
At that point I gave up (but haven't tried your suggestion). For the hypothetical case of this being production-ready code, it would really have to work though. Systems built on top of the database can't run certain smoke or acceptance tests without a running cluster, and one shouldn't have to orchestrate multiple nodes just to get something to talk to. |
@tschottdorf I can't speak for the original Verdi team, but production-ready code is generally not a goal in academic projects, nor encouraged to any significant degree by stakeholders (funding agencies, thesis committees, etc.). With that said, Verdi Raft has accumulated more software engineering effort than most similar projects (possibly excluding Microsoft's IronFleet), for example (1) extensive use of continuous integration, (2) automated dependency management via OPAM, (3) unit tests and integration tests for unverified code, (4) cluster deployment and management automation via Capistrano ( My feeling is that changing I the meantime, I created a tentative branch where singleton clusters are prevented from starting up at the command-line level. |
@palmskog I agree, that's why I pointed it out in the first place - wouldn't have bothered if this didn't look like it'd been pushed fairly far out of the bounds of pure academia already I'll close this issue, but I'll keep your above suggestion for a proper fix in mind. Could be a good exercise unless it spirals out of control. |
I'm going to re-open this issue since the current solution is unsatisfactory in the long term. Nobody has any cycles to fix this properly right now, but we might get around to it in the future and will happily review any PRs with proposed solutions. Thanks to @tschottdorf for reporting this. |
I'm trying to run the benchmarks against a single-node system:
The client logged above is the following invocation:
I haven't dug deeper but I did verify that I can run the benchmarks against a three-node cluster (everything running on the same machine). So, perhaps I'm silly or there is a problem with the edge case of a single-node system.
The text was updated successfully, but these errors were encountered: