[ 
https://issues.apache.org/jira/browse/MESOS-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-873:
----------------------------------

    Summary: Crash in os::killtree on Mavericks  (was: Crash in os::killtree on 
Mavericks )

> Crash in os::killtree on Mavericks
> ----------------------------------
>
>                 Key: MESOS-873
>                 URL: https://issues.apache.org/jira/browse/MESOS-873
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>         Environment: Mac OS X Mavericks
>            Reporter: Niklas Quarfot Nielsen
>            Assignee: Benjamin Hindman
>             Fix For: 0.19.0
>
>
> This is a crash we experienced on a Mavericks installation. We haven't been 
> able to reproduce it on other machines since, but managed to capture core 
> files from the crashes.
> Here is the stack trace from the crashing thread:
>   thread #2: tid = 0x0001, 0x0000000106816de5 mesos-executor`os::process(int) 
> + 4133, stop reason = signal SIGSTOP
>     frame #0: 0x0000000106816de5 mesos-executor`os::process(int) + 4133
>     frame #1: 0x000000010681734c mesos-executor`os::processes() + 316
>     frame #2: 0x0000000106817752 mesos-executor`os::killtree(int, int, bool, 
> bool) + 66
>     frame #3: 0x0000000106819748 
> mesos-executor`mesos::internal::CommandExecutorProcess::shutdown(mesos::ExecutorDriver*)
>  + 200
>     frame #4: 0x000000010798be70
>     frame #5: 0x000000010798be60
>     frame #6: 0x0000000106b21c20 
> libmesos-0.16.0.dylib`process::Event::~Event() + 32
>     frame #7: 0x90c307894810c083
> The stop condition is wrong (all threads in the core file is reported as 
> stopped). 
> Here is a snippet of disassemble of the failing frame:
>    0x106817306:  je     0x106817460               ; os::processes() + 592
>    0x10681730c:  movq   16(%rsp), %rax
>    0x106817311:  movq   296(%rsp), %rbx
>    0x106817319:  leaq   16(%rax), %r14
>    0x10681731d:  leaq   128(%rsp), %rax
>    0x106817325:  addq   $8, %r14
>    0x106817329:  movq   %rax, 24(%rsp)
>    0x10681732e:  leaq   384(%rsp), %rbp
>    0x106817336:  cmpq   %rbx, %r14
>    0x106817339:  je     0x106817530               ; os::processes() + 800
>    0x10681733f:  movl   32(%rbx), %esi
>    0x106817342:  movq   24(%rsp), %rdi
>    0x106817347:  callq  0x10681d5a0               ; symbol stub for: 
> os::process(int)
> -> 0x10681734c:  movl   128(%rsp), %esi
>    0x106817353:  testl  %esi, %esi
>    0x106817355:  jne    0x1068173e0               ; os::processes() + 464
>    0x10681735b:  movq   136(%rsp), %rsi
>    0x106817363:  movq   %rbp, %rdi
>    0x106817366:  callq  0x10681d58e               ; symbol stub for: 
> os::Process::Process(os::Process const&)
>    0x10681736b:  movl   $112, %edi
>    0x106817370:  callq  0x10681d9e4               ; symbol stub for: operator 
> new(unsigned long)
> We got to (while investigation the crash live in lldb) that using sysctl to 
> get argument count probably was the reason for the crash, but still with no 
> ways to validate this.
> We can dig further into the core dump, if you know any suspected reasons for 
> the failure / where to look further.
> Also, since we haven't been able to reproduce the crash. If we don't hear of 
> any others with the same problem, we can probably mark this as won't fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to