[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-09-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #10 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 159011 merged by jenkins-bot:
Make timer/profiler be thread-safe

https://gerrit.wikimedia.org/r/159011

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-09-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

Ori Livneh o...@wikimedia.org changed:

   What|Removed |Added

 Status|PATCH_TO_REVIEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-09-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

Gerrit Notification Bot gerritad...@wikimedia.org changed:

   What|Removed |Added

 Status|NEW |PATCH_TO_REVIEW

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-09-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #9 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 159011 had a related patch set uploaded by Tim Starling:
[WIP] Make timer/profiler be thread-safe

https://gerrit.wikimedia.org/r/159011

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #8 from Ori Livneh o...@wikimedia.org ---
Created attachment 16332
  -- https://bugzilla.wikimedia.org/attachment.cgi?id=16332action=edit
Small helper script for reproducing the bug

Fires off multiple requests to the enwiki API to parse [[en:Barack Obama]].
Only usable from the WMF cluster.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

Ori Livneh o...@wikimedia.org changed:

   What|Removed |Added

   Priority|Normal  |High
 CC||aschulz4...@gmail.com
   See Also||https://bugzilla.wikimedia.
   ||org/show_bug.cgi?id=68413
   Assignee|wikibugs-l@lists.wikimedia. |tstarl...@wikimedia.org
   |org |
Summary|Lua takes a lot more CPU|The way LuaSandbox measures
   |time with HHVM jobrunners   |CPU time is not thread-safe

--- Comment #1 from Ori Livneh o...@wikimedia.org ---
The issue is that LuaSandbox is not thread-safe. The figure you're seeing
conflates the CPU time of several concurrent threads of execution.

* LuaSandbox uses CLOCK_PROCESS_CPUTIME_ID to measure time, but
CLOCK_PROCESS_CPUTIME_ID measures CPU time consumed by all threads of the
calling process.

* LuaSandbox requests to be notified with a SIGEV_SIGNAL upon timer expiration,
but at any point in time, only only one such signal is queued per timer per
process.  

* LuaSandbox uses the sigprocmask() system call to block signals, but its
behavior in a multithreaded process is unspecified.  pthread_sigmask() must be
used instead.

* When the normal or emergency timers are stopped, any queued SIGEV_SIGNAL for
the entire process is flushed. 

* When multiple threads are executing concurrently, the timer for the entire
process is repeatedly re-armed (i.e., the exiration is reset).

* Confusingly, LUASANDBOX_G(signal_handler_installed) is thread-local, not
global. This means every thread calls luasandbox_timer_install_handler.  The
latter caches the old handler in another thread-local variable and reinstalls
it when LuaSandbox is deactivating. When LuaSandbox is active in multiple
threads, individual threads end up re-installing each other's signal handlers,
keeping each other alive.

The above applies specifically to the normal and emergency timers. The
profiling feature has its own bag of thread-safety / reentrancy issues, which I
propose we deal with in the context of bug 68413 rather than this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #2 from Jackmcbarn jackmcbarn+w...@gmail.com ---
I notice that it's not just Lua time usage that's bad; core's CPU time usage
is also really high. Is that problem related to this one?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #3 from Ori Livneh o...@wikimedia.org ---
(In reply to Jackmcbarn from comment #2)
 I notice that it's not just Lua time usage that's bad; core's CPU time
 usage is also really high. Is that problem related to this one?

Yes. The fact that concurrent threads reinstall each other's signal handlers
means that requests can make each other linger, increasing the backend response
time.

Your question prompted me to take a look at ParserOutput::getTimes. I found
that it also has a separate bug which, like this one, stems from the
assumptions it makes about the underlying runtime's threading model. Filed as
bug 70227.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

Jackmcbarn jackmcbarn+w...@gmail.com changed:

   What|Removed |Added

   Keywords|performance |

--- Comment #4 from Jackmcbarn jackmcbarn+w...@gmail.com ---
Removing performance keyword, since this is a bug in the timer rather than
anything actually being slow.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #5 from Victor Vasiliev vasi...@gmail.com ---
So, is there more to this bug than replacing CLOCK_PROCESS_CPUTIME_ID with
CLOCK_THREAD_CPUTIME_ID?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #6 from Ori Livneh o...@wikimedia.org ---
Created attachment 16326
  -- https://bugzilla.wikimedia.org/attachment.cgi?id=16326action=edit
A naive fix that doesn't work

(In reply to Victor Vasiliev from comment #5)
 So, is there more to this bug than replacing CLOCK_PROCESS_CPUTIME_ID with
 CLOCK_THREAD_CPUTIME_ID?

Yes. See the attached patch, which doesn't work. (It segfaults in
luasandbox_timer_set_one_time). There are things which should be thread-local
but aren't. It's going to take some very careful work to fix this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 70177] The way LuaSandbox measures CPU time is not thread-safe

2014-08-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

--- Comment #7 from Ori Livneh o...@wikimedia.org ---
Two ideas:

* In a multithreaded environment, it might be a lot saner to use the
timerfd_*[1] family of system calls, which operate on timers that notify via
file descriptors rather than signals. But this is a Linux-specific feature, so
it would come at the cost of portability.

* In an ideal world, LuaSandbox would use high-level timing and code profiling
interfaces provided by HHVM, such as Timer::GetRusageMicros[2], cpuCyles[3],
etc.

[1]: http://man7.org/linux/man-pages/man2/timerfd_create.2.html
[2]: https://github.com/facebook/hhvm/blob/master/hphp/util/timer.cpp#L108
[3]: https://github.com/facebook/hhvm/blob/master/hphp/util/cycles.h#L29

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l