Fwd: uprobed multithreaded app serializes in signal-handling code

Jim Keniston Wed, 23 Jan 2008 16:27:42 -0800

To be notified of breakpoint and single-step traps, uprobes currently
uses a utrace report_signal callback.  Per the data below, maybe we need
to intercept the trap before it turns into a signal -- preferably via a
new utrace event callback.  I think this is already on Roland's TODO
list (and discussed on a July 23, 2007 conference call).  Anyway, here's
some motivation.

Jim Keniston
-------- Forwarded Message --------
From: jkenisto at us dot ibm dot com
<[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bug uprobes/5660] New: uprobed multithreaded app serializes in
signal-handling code
Date: 23 Jan 2008 00:49:07 -0000

Uprobing a multithreaded app on an x86_64 SMP system shows serious
serialization of the threads in the kernel's signal-handling code.
In the app in question, the child threads just call a dummy function
repeatedly; the uprobes module probes the dummy function's entry point.

Here's a summary of data reported by oprofile.  It shows
that with more than one thread running, utrace_get_signal(),
get_signal_to_deliver(), and force_sig_info() are the top three
consumers of CPU time.  I'm guessing that the threads are serializing
on task_struct->sighand->siglock (which is shared among tasks of the
same process).

#CPUs: 4
                    pct (rank)         pct (rank)             pct (rank)
threads usec/iter** utrace_get_signal  get_signal_to_deliver  force_sig_info
1*        4.4       12.2% (1)           2.4% (13)             < 1%
1         4.0       12.0% (1)           3.5% (7)              < 1%
2         9.2       21.4% (1)          13.2% (2)               5.7% (3)
3        19.0       30.9% (1)          24.4% (2)              13.5% (3)
4        29.7       36.7% (1)          25.6% (2)              14.4% (3)
*single-thread program -- no parent thread
** Divide by #threads to get usec per probe hit.
Percentages are of total kernel+user time.

I have no particular reason to think that this problem is specific
to x86_64.  I've observed poor scaling on multithreaded apps before,
but never got around to pointing oprofile at it.  I was hoping it was
something we could fix in uprobes. :-|

Fwd: uprobed multithreaded app serializes in signal-handling code

Reply via email to