Re: core dumps during signal handling

Danek Duvall Fri, 10 Jun 2016 09:19:42 -0700

On Thu, Jun 09, 2016 at 10:53:16PM +0200, Bram Moolenaar wrote:

> Danek Duvall wrote:
> 
> > I've been chasing down a bug in vim, where killing vim (usually via SIGHUP
> > or SIGTERM) causes it to get a SIGSEGV and dump core.  I'm doing my work on
> > Solaris, but I can get it to do the same thing on Ubuntu 14.04, so it's not
> > strictly a Solaris thing.
> 
> Hmm, killing Vim is likely to cause damage.  Vim tries to exit cleanly,
> but that is not so easy.
> 
> > Most reproducibly (for me, at least), it seems to be due to the BufWinLeave
> > autocommand that fugitive sets up.  That would make sense to me -- running
> > autocommands in a signal handler seems to me like it would be inherently
> > difficult to get right, and certainly not generically possible.
> 
> Does this work perhaps happen on the signal stack?  It's not unlikely it
> just runs out of stack.


Bingo.  We're running on the alternate stack, but its's just not big
enough.  Here's the stack from a recent core:

    0000000649eb7170 get_func_tv+0x17()
    0000000649eb7210 eval7+0x399()
    0000000649eb72f0 eval6+0x30()
    0000000649eb7440 eval5+0x2d()
    0000000649eb75b0 eval4+0x2b()
    0000000649eb76b0 eval3+0x2b()
    0000000649eb77c0 eval2+0x30()
    0000000649eb7810 eval1+0x24()
    0000000649eb7a40 get_func_tv+0xa5()
    0000000649eb7ae0 eval7+0x399()
    0000000649eb7bc0 eval6+0x30()
    0000000649eb7d10 eval5+0x2d()
    0000000649eb7e80 eval4+0x2b()
    0000000649eb7f80 eval3+0x2b()
    0000000649eb8090 eval2+0x30()
    0000000649eb80e0 eval1+0x24()
    0000000649eb8310 get_func_tv+0xa5()
    0000000649eb83b0 eval7+0x399()
    0000000649eb8490 eval6+0x30()
    0000000649eb85e0 eval5+0x2d()
    0000000649eb8750 eval4+0x2b()
    0000000649eb8850 eval3+0x2b()
    0000000649eb8960 eval2+0x30()
    0000000649eb89b0 eval1+0x24()
    0000000649eb8a30 ex_execute+0x80()
    0000000649eb8ba0 do_one_cmd+0x1b1b()
    0000000649eb9170 do_cmdline+0x9b4()
    0000000649eb9340 apply_autocmds_group+0x82c()
    0000000649eb9360 apply_autocmds+0x1e()
    0000000649eb9380 vim`getout+0xba()
    0000000649eb9390 libc.so.1`__sighndlr+6()
    0000000649eb9430 libc.so.1`call_user_handler+0x2f1()
    0000000649eb9460 libc.so.1`sigacthandler+0xde(f, 0, 649eb9480)
    ffff80d396e4ec60 libc.so.1`__pollsys+0xa()
    ffff80d396e4ecf0 libc.so.1`pselect+0x193()
    ffff80d396e4ed10 libc.so.1`select+0x6b()
    ffff80d396e52d80 RealWaitForChar+0x200()

So it's clearly on the alternate stack, but the difference between the base
of that stack and the top is well over the default SIGSTKSZ (8192).  I'm
not sure how to get the information from the core that it died because it
blew the stack, but it seems entirely reasonable.  I set ss_size to 10 *
SIGSTKSZ, and it worked.

It's not the case for all the cores I have, though.  For instance,

    0000000de03ee050 libc.so.1`_ndoprnt_s+0x1a()
    0000000de03ee080 libc.so.1`_ndoprnt+0x12()
    0000000de03ee140 libc.so.1`vsnprintf+0xad()
    0000000de03ee210 libc.so.1`vasprintf+0x43()
    0000000de03ee2f0 libc.so.1`asprintf+0x9c()
    0000000de03ee360 libc.so.1`checkit+0x3f()
    0000000de03ee410 libc.so.1`fallback+0x28e()
    0000000de03ee500 libc.so.1`locale_fallback+0x126()
    0000000de03eeb30 libc.so.1`_real_gettext_u_l+0x538()
    0000000de03eeb90 libc.so.1`_real_gettext_u+0x49()
    0000000de03eebc0 libc.so.1`gettext+0x6b()
    0000000de03eec20 auto_next_pat+0x154()
    0000000de03eed50 apply_autocmds_group+0x5ec()
    0000000de03eeda0 apply_autocmds+0x47()
    0000000de03eedf0 vim`getout+0x12e()
    0000000de03eee10 preserve_exit+0x105()
    0000000de03eee40 vim`deathtrap+0x202()
    0000000de03eee50 libc.so.1`__sighndlr+6()
    0000000de03eeef0 libc.so.1`call_user_handler+0x2f1()
    0000000de03eef20 libc.so.1`sigacthandler+0xde(f, 0, de03eef40)

has a much smaller stack, well within the 8k.  But it's calling unsafe
functions (I think).  I also have a bunch of stacks that have nfa_regmatch
and mch_breakcheck at the top with a <8k stack.  I'm not sure what's going
on there.  And I'm not sure how I got into those situations.

> > The thought I have was to do the classic thing where the signal handler
> > sets a global flag and returns immediately.
> > [ ... ]
> 
> This might work better when actually waiting for a character, but it
> makes Vim completely ignore the signal when it's looping somewhere.
> Which would be the reason to send it a signal in the first place.

True.

> We could set a flag that indicates Vim is waiting for a character, and
> the signal will be handled there.  Still need to find a way to wake up
> that loop.  And the problem would still exist when not waiting for a
> character (which I would assume is the more common situation where you
> want to kill Vim).  Thus it won't help much.
> 
> It's possible to at least ignore autocommands when receiving a deadly
> signal.  Hmm, how does any autocommand gets triggered anyway?

That might work.  getout() calls apply_autocmds(), several times.

Danek

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: core dumps during signal handling

Raspunde prin e-mail lui