-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGABRT when opening mail #284
Comments
Does it happen when you open a gazillion of threads and while purebred is calculating the length of the list, you open the first or second mail? |
@romanofski good Question; worth checking that. Did you encounter it? |
Yeah pretty much almost reliably. Steps:
|
@romanofski OK awesome, that's good info! I've got some time off this week, I'll try and sort it out :) |
Info from further investigation:
|
Yes. I can reproduce those observations too. |
I won't have time to investigate further for a few weeks... I fear I will need to instrument the heck out of hs-notmuch to find out what's going on. |
Dumping some links about Xapian thread safety here. Probably not related to this issue but I just stumbled across them and don't want to forget about them: |
Without saying I'm giving up on finding and understand the cause, I'm pondering whether to implement the following to see if it disappears the error:
I'm suspicious that having multiple RW database handles is a factor in this bug. But I don't have any hard evidence. The writes are currently performed in the Brick event loop so they are certainly serialised, but maybe something isn't being GC'd promptly enough (or at all). Then again, the fact this only happens during the background result traversal does weight against this theory somewhat. update 1: using a single writer thread, holding a single r/w database handle, does dispel the issue. But the changes only get written when the database gets closed (at termination of the whole program) so there's not much point to the change. Moving to per-event |
@frasertweedale sounds like a tough nut to crack. Out of curiosity, is the problem triggered because of GC or the absence of it? If I understood your comment right, it's because data is GC'd which shouldn't have been garbage collected (yet)? |
I don't think that's it, but it may be somehow related. I'm really still no closer to working this out :) |
I just thought, maybe it is to do with the thread safety of talloc? One very quickly finds that talloc is not thread safe: https://talloc.samba.org/talloc/doc/html/libtalloc__threads.html. |
Maybe this could help too (if you haven't used it already): https://github.com/bgamari/ghc-debug |
Added this to our 1.0 milestone, since well, we can't release release 1.0 without this plaguing us. Once I've chewed through the rest two issues, I'll see if I can help with this in any way. |
@frasertweedale what about we put a build flag around the lazy loading feature. I haven't checked how tricky/cumbersome that would be, but it would help us release a first version, give us some exposure and perhaps help from others? Then it can still be activated during compilation time for peeps who want to use and hack on it? |
This puts a build flag around the lazy vector feature. Motivation behind is to make the first release, get some publicity and hopefully get more people interested to hack on Purebred. Possibly help with fixing the bug. Relates to #284
This puts a build flag around the lazy vector feature. Motivation behind is to make the first release, get some publicity and hopefully get more people interested to hack on Purebred. Possibly help with fixing the bug. Relates to #284
Can confirm this still happens with GHC 9.0. |
@frasertweedale do you reckon that we might need to look at a different architecture here on how to communicate with notmuch? I remember crazy ideas we had were something like using a messaging system. But it seems wrestling the memory is a tough nut to crack. |
Describe the bug
purebred terminated SIGABRT when I opened a mail.
To Reproduce
Not reliably reproducible. May be related to something begin GC'd that we need to hang on to.
The text was updated successfully, but these errors were encountered: