[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 Bryan Davis changed: What|Removed |Added CC||bda...@wikimedia.org --- Comment #16 from Bryan Davis --- Cross post from https://bugzilla.wikimedia.org/show_bug.cgi?id=68413#c8 : Updated beta servers to hhvm-luasandbox 2.0-3 build, changed config back to luasandbox and restarted hhvm fcgi container. Still seeing crashes: Host: deployment-mediawiki01 ProcessID: 27248 ThreadID: 7fad89bff700 ThreadPID: 27750 Name: unknown program Type: Segmentation fault Runtime: hhvm Version: heads/wikimedia-0-g8b842db4e2db664a9b4d543047ae154a6dd59de6 DebuggerCount: 0 # 0 virtual thunk to boost::exception_detail::clone_impl >::rethrow() const at /usr/bin/hhvm:0 # 1 lua_sethook at /usr/lib/x86_64-linux-gnu/liblua5.1-c++.so.0:0 # 2 HPHP::Extension::moduleInfo(HPHP::Array&) at /usr/lib/hphp/extensions/20140702/luasandbox.so:0 # 3 timer_sigev_thread at /build/buildd/eglibc-2.19/rt/../nptl/sysdeps/unix/sysv/linux/timer_routines.c:66 # 4 start_thread at /build/buildd/eglibc-2.19/nptl/pthread_create.c:312 # 5 clone at /build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113 $ hhvm --version HipHop VM 3.3.0-dev (rel) Compiler: heads/wikimedia-0-g8b842db4e2db664a9b4d543047ae154a6dd59de6 Repo schema: ce469da81c1d8ec23f3a4aa889afadad8df5a759 $ dpkg -l|grep ^ii|awk '{printf "%-20s %s\n", $2, $3}'|grep hhvm hhvm 3.1+20140723-1+wmf1 hhvm-dev 3.1+20140723-1+wmf1 hhvm-fss 1.1-2 hhvm-luasandbox 2.0-3 hhvm-wikidiff2 1.3-2 -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 Giuseppe Lavagetto changed: What|Removed |Added Status|UNCONFIRMED |PATCH_TO_REVIEW CC||glavage...@wikimedia.org Ever confirmed|0 |1 --- Comment #15 from Giuseppe Lavagetto --- (In reply to Tim Starling from comment #14) > On hhvm.256.io, I have installed a LuaSandbox statically linked against Lua > compiled as C++. With this build, the reduced test case exits cleanly with > "Fatal error: unknown exception". That's quite a nice failure mode, and > deals with all Lua errors thrown from unprotected functions, not just OOMs. https://gerrit.wikimedia.org/r/#/c/149001 did the trick for the package, the last package version, hhvm-luasandbox 2.0-3 links against the c++ version of the library. It is being tested now, AFAIK -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #14 from Tim Starling --- On hhvm.256.io, I have installed a LuaSandbox statically linked against Lua compiled as C++. With this build, the reduced test case exits cleanly with "Fatal error: unknown exception". That's quite a nice failure mode, and deals with all Lua errors thrown from unprotected functions, not just OOMs. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #13 from Tim Starling --- Created attachment 16028 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16028&action=edit Reduced test case I used a recursive table "a = {a,0}" to force Lua to do the small allocations necessary to reliably take the memory usage up to within a short distance of the limit. Then by having PHP attempt to re-enter Lua after the resulting OOM, a segfault reliably results. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #12 from Tim Starling --- Also, because Lua is in the stack, the in_lua flag is set, so "slop" is zero. That is to say, the hack intended to fix bug 59130 is disabled. The Lua userspace takes the usage to within 36 bytes of the limit, and lua_pushcclosure() tries to allocate 40, so after the error is handled, it's assured that re-entering the same state will cause an unprotected OOM. in_lua and in_php are both flags indicating the contents of the stack, not the immediate caller. Maybe we need a third flag which indicates that the immediate caller is unprotected. Also, I think the 1 MB slop might be a bit too conservative. Since the result of an OOM is a crash, we should probably just disable the memory limit entirely for unprotected calls. A third thing we should do is to compile our own Lua library in C++. This might not be convenient for all reusers, but it causes Lua to use exceptions for error propagation, instead of setjmp/longjmp, which should be safer. #if defined(__cplusplus) /* C++ exceptions */ #define LUAI_THROW(L,c)throw(c) #define LUAI_TRY(L,c,a)try { a } catch(...) \ { if ((c)->status == 0) (c)->status = -1; } Note that it does catch and discard all exceptions, even ones from HHVM -- I noticed this during design work for https://github.com/facebook/hhvm/pull/1986 . The Lua VM is not exception-safe, so it has a choice between discarding exceptions or crashing. That's one of the reasons why EZC catches and saves exceptions: to prevent them from breaking Lua. If we were integrating Lua as a non-EZC HHVM extension, we would still need that catch and save layer, equivalent to PR 1986. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #11 from Tim Starling --- Nice work. A Lua error in an unprotected function like lua_pushcclosure() could indeed cause a longjmp to unwind HHVM's stack. If Lua was not in the stack when this happened, it would call the panic hook (luasandbox_panic) and we would get an informative error message, like bug 59130. But it is in the stack, so it tries to longjmp back to the last lua_pcall(). -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #10 from Brett Simmers --- I don't know anything about the internals of the Zend engine. Is it possible that this is happening with PHP5 as well but everything happens to work out anyway? I'll try it tomorrow but it's possible that in a release build of HHVM without asserts on, things could mostly appear to work. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #9 from Brett Simmers --- After some fun times with gdb, I'm almost certain this is luasandbox related, though I'm not yet sure what the root cause is. I was seeing what looked like some C++ frames disappearing without the relevant destructors being called, so I suspected something was going wrong with a longjmp somewhere and went with that. http://pastebin.com/mKeqJYLZ is a stacktrace I took right before everything went off the rails. Notice that ExecutionContext::invokeFunc and luasandbox_call_helper both appear twice, indicating that PHP called into lua, which called back into PHP, which is now calling into lua again (this is expected behavior, right?). I previously stepped through one longjmp that didn't escape the call to luasandbox_call_helper in frame #5, but the one about to happen doesn't go so well. I stepped through this longjmp up until it restored the previous stack and took another backtrace: http://pastebin.com/6UgHygPs. Notice that it appears to have jumped from the nested invocation of lua to the outer one, skipping over all the hhvm frames in between the two. This explains how we eventually end up in the outmost invokeFunc frame with the VM state still looking like the nested frame: the nested VM state is supposed to be popped by a destructor in invokeFunc that never ran. I'm done for today but the best theory I have so far is that there's some global setjmp/longjmp buffer in liblua that's being used improperly. Is liblua supposed to handle reentrancy like this? I noticed that lua_pushcclosure() is in the backtrace for the problematic longjmp but not for the previous one; maybe that's related? -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #8 from Tim Starling --- ...that being $iteratorNode. The TypedValue* for $iteratorNode was consistent enough for me to set a watch on it, which showed it being overwritten with garbage by translator-asm-helpers.S, probably the stack write on line 52. That seemed pretty JIT-specific, so I tried running it without the JIT, and it instead asserted in the ~ScopeGuardImpl of ExecutionContext::invokeFunc(), when invoking an unnamed function (presumably the eval.php command line). Before the write by translator-asm-helpers.S, $iteratorNode had a PPNode_Hash_Array in it, as expected. There's nothing obviously linking this to Scribunto, at present. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #7 from Tim Starling --- It is asserting while checking local number 7 of PPFrame_Hash::expand(). -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #6 from Tim Starling --- Yes, that. There are various other ways to do it, but that one lets you modify the input text before you run it, which is often useful for isolating parser-related bugs. However, I haven't found any reduced input text yet -- it seems to depend on execution history. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #5 from Ori Livneh --- (In reply to Brett Simmers from comment #4) > How can I parse a page in CLI mode? cd /srv/mediawiki php maintenance/eval.php > $title = Title::newFromText('San Francisco'); $text = > Revision::newFromTitle($title)->getText(); $wgParser->parse($text, $title, > new ParserOptions); -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #4 from Brett Simmers --- I tried with that applied earlier today and it still crashed (I've been messing with the contents of that working dir recently). I guess I didn't notice that it was a different crash. How can I parse a page in CLI mode? -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #3 from Tim Starling --- The backtrace suggests https://github.com/facebook/hhvm/pull/3121 , which I see you don't have the fix for in /home/bsimmers/hhvm-dbg . When I parse that page in CLI mode using /usr/local/bin/hhvm, I reliably get an assertion from JIT::checkFrame: hhvm: /srv/hhvm-dev/hphp/runtime/base/ref-data.h:118: HPHP::Cell* HPHP::RefData::tv(): Assertion `m_magic == Magic::kMagic' failed. #4 0x00cd81a1 in tv (this=) at /srv/hhvm-dev/hphp/runtime/base/ref-data.h:118 #5 HPHP::tvIsPlausible (tv=...) at /srv/hhvm-dev/hphp/runtime/base/tv-helpers.cpp:82 #6 0x014bc05b in HPHP::JIT::checkFrame (fp=0x7fffdf97ea70, sp=, checkLocals=checkLocals@entry=true) at /srv/hhvm-dev/hphp/runtime/vm/jit/translator-runtime.cpp:715 #7 0x014bdcd7 in HPHP::JIT::traceCallback (fp=0x7fffdf97ea70, sp=0x7fffdf97e920, pcOff=10060, rip=0x3ea11c0) at /srv/hhvm-dev/hphp/runtime/vm/jit/translator-runtime.cpp:728 #8 0x03ea11d9 in ?? () -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 Andre Klapper changed: What|Removed |Added Keywords||hiphop -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 Max Semenik changed: What|Removed |Added Attachment #15964|application/octet-stream|text/plain mime type|| CC||maxsem.w...@gmail.com -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 Ori Livneh changed: What|Removed |Added CC||o...@wikimedia.org --- Comment #2 from Ori Livneh --- Created attachment 15964 --> https://bugzilla.wikimedia.org/attachment.cgi?id=15964&action=edit hhvm.256.io:~bsimmers/lua.log -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 68196] HHVM crash loading San Francisco page
https://bugzilla.wikimedia.org/show_bug.cgi?id=68196 --- Comment #1 from Brett Simmers --- ~bsimmers/lua.log has a much more useful-looking stacktrace than what we were previously seeing. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l