[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #10 from Gerrit Notification Bot gerritad...@wikimedia.org --- Change 159011 merged by jenkins-bot: Make timer/profiler be thread-safe https://gerrit.wikimedia.org/r/159011 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 Ori Livneh o...@wikimedia.org changed: What|Removed |Added Status|PATCH_TO_REVIEW |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 Gerrit Notification Bot gerritad...@wikimedia.org changed: What|Removed |Added Status|NEW |PATCH_TO_REVIEW -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #9 from Gerrit Notification Bot gerritad...@wikimedia.org --- Change 159011 had a related patch set uploaded by Tim Starling: [WIP] Make timer/profiler be thread-safe https://gerrit.wikimedia.org/r/159011 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #8 from Ori Livneh o...@wikimedia.org --- Created attachment 16332 -- https://bugzilla.wikimedia.org/attachment.cgi?id=16332action=edit Small helper script for reproducing the bug Fires off multiple requests to the enwiki API to parse [[en:Barack Obama]]. Only usable from the WMF cluster. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 Ori Livneh o...@wikimedia.org changed: What|Removed |Added Priority|Normal |High CC||aschulz4...@gmail.com See Also||https://bugzilla.wikimedia. ||org/show_bug.cgi?id=68413 Assignee|wikibugs-l@lists.wikimedia. |tstarl...@wikimedia.org |org | Summary|Lua takes a lot more CPU|The way LuaSandbox measures |time with HHVM jobrunners |CPU time is not thread-safe --- Comment #1 from Ori Livneh o...@wikimedia.org --- The issue is that LuaSandbox is not thread-safe. The figure you're seeing conflates the CPU time of several concurrent threads of execution. * LuaSandbox uses CLOCK_PROCESS_CPUTIME_ID to measure time, but CLOCK_PROCESS_CPUTIME_ID measures CPU time consumed by all threads of the calling process. * LuaSandbox requests to be notified with a SIGEV_SIGNAL upon timer expiration, but at any point in time, only only one such signal is queued per timer per process. * LuaSandbox uses the sigprocmask() system call to block signals, but its behavior in a multithreaded process is unspecified. pthread_sigmask() must be used instead. * When the normal or emergency timers are stopped, any queued SIGEV_SIGNAL for the entire process is flushed. * When multiple threads are executing concurrently, the timer for the entire process is repeatedly re-armed (i.e., the exiration is reset). * Confusingly, LUASANDBOX_G(signal_handler_installed) is thread-local, not global. This means every thread calls luasandbox_timer_install_handler. The latter caches the old handler in another thread-local variable and reinstalls it when LuaSandbox is deactivating. When LuaSandbox is active in multiple threads, individual threads end up re-installing each other's signal handlers, keeping each other alive. The above applies specifically to the normal and emergency timers. The profiling feature has its own bag of thread-safety / reentrancy issues, which I propose we deal with in the context of bug 68413 rather than this bug. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #2 from Jackmcbarn jackmcbarn+w...@gmail.com --- I notice that it's not just Lua time usage that's bad; core's CPU time usage is also really high. Is that problem related to this one? -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #3 from Ori Livneh o...@wikimedia.org --- (In reply to Jackmcbarn from comment #2) I notice that it's not just Lua time usage that's bad; core's CPU time usage is also really high. Is that problem related to this one? Yes. The fact that concurrent threads reinstall each other's signal handlers means that requests can make each other linger, increasing the backend response time. Your question prompted me to take a look at ParserOutput::getTimes. I found that it also has a separate bug which, like this one, stems from the assumptions it makes about the underlying runtime's threading model. Filed as bug 70227. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 Jackmcbarn jackmcbarn+w...@gmail.com changed: What|Removed |Added Keywords|performance | --- Comment #4 from Jackmcbarn jackmcbarn+w...@gmail.com --- Removing performance keyword, since this is a bug in the timer rather than anything actually being slow. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #5 from Victor Vasiliev vasi...@gmail.com --- So, is there more to this bug than replacing CLOCK_PROCESS_CPUTIME_ID with CLOCK_THREAD_CPUTIME_ID? -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #6 from Ori Livneh o...@wikimedia.org --- Created attachment 16326 -- https://bugzilla.wikimedia.org/attachment.cgi?id=16326action=edit A naive fix that doesn't work (In reply to Victor Vasiliev from comment #5) So, is there more to this bug than replacing CLOCK_PROCESS_CPUTIME_ID with CLOCK_THREAD_CPUTIME_ID? Yes. See the attached patch, which doesn't work. (It segfaults in luasandbox_timer_set_one_time). There are things which should be thread-local but aren't. It's going to take some very careful work to fix this. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177 --- Comment #7 from Ori Livneh o...@wikimedia.org --- Two ideas: * In a multithreaded environment, it might be a lot saner to use the timerfd_*[1] family of system calls, which operate on timers that notify via file descriptors rather than signals. But this is a Linux-specific feature, so it would come at the cost of portability. * In an ideal world, LuaSandbox would use high-level timing and code profiling interfaces provided by HHVM, such as Timer::GetRusageMicros[2], cpuCyles[3], etc. [1]: http://man7.org/linux/man-pages/man2/timerfd_create.2.html [2]: https://github.com/facebook/hhvm/blob/master/hphp/util/timer.cpp#L108 [3]: https://github.com/facebook/hhvm/blob/master/hphp/util/cycles.h#L29 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l