SIGABRT when opening mail #284

frasertweedale · 2019-04-05T02:36:58Z

Describe the bug
purebred terminated SIGABRT when I opened a mail.

Apr 05 12:26:35 T470s systemd[1]: Started Process Core Dump (PID 19268/UID 0).
Apr 05 12:26:36 T470s systemd-coredump[19269]: Process 19235 (purebred-linux-) of user 1000 dumped core.
                                               
    Stack trace of thread 19243:
    #0  0x00007fbb790aaeab raise (libc.so.6)
    #1  0x00007fbb790955b9 abort (libc.so.6)
    #2  0x00007fbb7a315931 talloc_abort.cold.19 (libtalloc.so.2)
    #3  0x00007fbb7a31604d _talloc_steal_loc.cold.47 (libtalloc.so.2)
    #4  0x00000000006a00f3 n/a (/home/ftweedal/.cache/purebred/purebred-linux-x86_64)
Apr 05 12:26:36 T470s abrtd[8533]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2019-01-22-10:24:24.235958-10186'
Apr 05 12:26:37 T470s abrt-notification[19333]: Process 19235 (purebred-linux-x86_64) crashed in _talloc_set_destructor.cold.20()

To Reproduce
Not reliably reproducible. May be related to something begin GC'd that we need to hang on to.

The text was updated successfully, but these errors were encountered:

romanofski · 2019-04-13T11:20:54Z

Does it happen when you open a gazillion of threads and while purebred is calculating the length of the list, you open the first or second mail?

frasertweedale · 2019-04-21T04:21:02Z

@romanofski good Question; worth checking that. Did you encounter it?

romanofski · 2019-04-21T06:37:02Z

Yeah pretty much almost reliably. Steps:

Search for *
Start opening the first one or two mails, b00m.

frasertweedale · 2019-04-22T00:08:21Z

@romanofski OK awesome, that's good info! I've got some time off this week, I'll try and sort it out :)

frasertweedale · 2019-04-24T06:39:49Z

Info from further investigation:

It only occurs while the background thread is traversing the spine of the thread list to determine the number of threads
If you wait for the "num threads" update to complete (background thread is done), then execute another large search, the problem can occur (i.e. it is not restricted to the initial search)
The problem occur upon opening the second unread mail (i.e. with tag "unread", or whatever nmNewTag is). You can precede or interleave opening of non-unread mails without triggering the abort.
This "state" (so to speak) does not persist across searches. If you start a long search, open one unread mail, background spine traversal completes, then open a second unread mail, the problem does not occur. Similarly, executing a second long search "starts over"; two unread mails must be opened during the spine traversal of second search.

romanofski · 2019-04-24T11:43:38Z

Yes. I can reproduce those observations too.

frasertweedale · 2019-04-30T12:55:37Z

I won't have time to investigate further for a few weeks... I fear I will need to instrument the heck out of hs-notmuch to find out what's going on.

frasertweedale · 2019-07-13T11:30:46Z

Dumping some links about Xapian thread safety here. Probably not related to this issue but I just stumbled across them and don't want to forget about them:

frasertweedale · 2019-07-13T11:51:14Z

Without saying I'm giving up on finding and understand the cause, I'm pondering whether to implement the following to see if it disappears the error:

Spawn a thread to perform Database write actions. Writes are performed only by this thread.
When we want to modify the database (e.g. edit tags, index file), send a message to that thread and it will do it on our behalf. Relevant message types/constructors would have to be defined.

I'm suspicious that having multiple RW database handles is a factor in this bug. But I don't have any hard evidence. The writes are currently performed in the Brick event loop so they are certainly serialised, but maybe something isn't being GC'd promptly enough (or at all). Then again, the fact this only happens during the background result traversal does weight against this theory somewhat.

update 1: using a single writer thread, holding a single r/w database handle, does dispel the issue. But the changes only get written when the database gets closed (at termination of the whole program) so there's not much point to the change. Moving to per-event withDatabase in the single writer thread, such that each r/w database handle will be closed when GC'd, results in the same problem (which is not all that surprising).

romanofski · 2019-07-13T21:28:02Z

@frasertweedale sounds like a tough nut to crack. Out of curiosity, is the problem triggered because of GC or the absence of it? If I understood your comment right, it's because data is GC'd which shouldn't have been garbage collected (yet)?

frasertweedale · 2019-07-14T04:37:43Z

I don't think that's it, but it may be somehow related. I'm really still no closer to working this out :)

frasertweedale · 2019-07-28T11:07:09Z

I just thought, maybe it is to do with the thread safety of talloc? One very quickly finds that talloc is not thread safe: https://talloc.samba.org/talloc/doc/html/libtalloc__threads.html.

romanofski · 2019-08-31T02:25:04Z

Maybe this could help too (if you haven't used it already): https://github.com/bgamari/ghc-debug

romanofski · 2020-06-23T20:42:40Z

Added this to our 1.0 milestone, since well, we can't release release 1.0 without this plaguing us. Once I've chewed through the rest two issues, I'll see if I can help with this in any way.

romanofski · 2021-05-20T22:44:01Z

@frasertweedale what about we put a build flag around the lazy loading feature. I haven't checked how tricky/cumbersome that would be, but it would help us release a first version, give us some exposure and perhaps help from others? Then it can still be activated during compilation time for peeps who want to use and hack on it?

This puts a build flag around the lazy vector feature. Motivation behind is to make the first release, get some publicity and hopefully get more people interested to hack on Purebred. Possibly help with fixing the bug. Relates to #284

frasertweedale · 2021-11-03T04:44:53Z

Can confirm this still happens with GHC 9.0.

romanofski · 2021-11-03T08:43:18Z

@frasertweedale do you reckon that we might need to look at a different architecture here on how to communicate with notmuch? I remember crazy ideas we had were something like using a messaging system. But it seems wrestling the memory is a tough nut to crack.

frasertweedale added the bug label Apr 5, 2019

frasertweedale changed the title ~~segv when opening mail~~ SIGABRT when opening mail Apr 6, 2019

frasertweedale mentioned this issue Jul 2, 2019

add support for charset plugins #302

Merged

romanofski mentioned this issue Mar 1, 2020

user documentation #378

Closed

romanofski added this to the 1.0 milestone Jun 23, 2020

romanofski mentioned this issue Dec 8, 2020

Regression: Background list length computation not happening in the background any more #406

Closed

romanofski modified the milestones: 1.0, Future Feature Sep 28, 2021

frasertweedale mentioned this issue Jul 22, 2022

UI freeze on GHC 9.2 on some operations (notmuch related) #468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGABRT when opening mail #284

SIGABRT when opening mail #284

frasertweedale commented Apr 5, 2019 •

edited

Loading

romanofski commented Apr 13, 2019

frasertweedale commented Apr 21, 2019

romanofski commented Apr 21, 2019

frasertweedale commented Apr 22, 2019

frasertweedale commented Apr 24, 2019

romanofski commented Apr 24, 2019

frasertweedale commented Apr 30, 2019

frasertweedale commented Jul 13, 2019

frasertweedale commented Jul 13, 2019 •

edited

Loading

romanofski commented Jul 13, 2019

frasertweedale commented Jul 14, 2019

frasertweedale commented Jul 28, 2019

romanofski commented Aug 31, 2019

romanofski commented Jun 23, 2020

romanofski commented May 20, 2021

frasertweedale commented Nov 3, 2021

romanofski commented Nov 3, 2021

SIGABRT when opening mail #284

SIGABRT when opening mail #284

Comments

frasertweedale commented Apr 5, 2019 • edited Loading

romanofski commented Apr 13, 2019

frasertweedale commented Apr 21, 2019

romanofski commented Apr 21, 2019

frasertweedale commented Apr 22, 2019

frasertweedale commented Apr 24, 2019

romanofski commented Apr 24, 2019

frasertweedale commented Apr 30, 2019

frasertweedale commented Jul 13, 2019

frasertweedale commented Jul 13, 2019 • edited Loading

romanofski commented Jul 13, 2019

frasertweedale commented Jul 14, 2019

frasertweedale commented Jul 28, 2019

romanofski commented Aug 31, 2019

romanofski commented Jun 23, 2020

romanofski commented May 20, 2021

frasertweedale commented Nov 3, 2021

romanofski commented Nov 3, 2021

frasertweedale commented Apr 5, 2019 •

edited

Loading

frasertweedale commented Jul 13, 2019 •

edited

Loading