A more practical question is - is there a way to replace the disruptor waiting strategy? I suspect the issue is the ginormous amount of threads vying to be woken up, running the disruptor queue with SleepWaitStrategy seems like it should alleviate this pain, but it seems the ability to set the wait strategy was removed in the transition from clojure to java in versions 0.10.x->1.0.x.
On Sun, Jun 11, 2017 at 3:32 PM, Roee Shenberg <[email protected]> wrote: > Our storm cluster (1.0.2) is running many trident topologies, each one is > local to a single worker, with each supervisor having 10 worker slots. > Every slot runs a copy of the same topology with a different configuration, > the topology being a fairly fat trident topology (e.g. ~300 threads per > topology - totalling >3000 threads on the machine) > > A quick htop showed a grim picture of most CPU time being spent in the > kernel: > [image: Inline image 1] > (note: running as root inside a docker) > > Here's an example top summary line: > > %Cpu(s):* 39.4 *us,* 51.1 *sy,* 0.0 *ni,* 8.6 *id,* 0.0 *wa,* 0.0 *hi,* > 0.1 *si,* 0.8 *st > This suggests actual kernel time waste, not I/O, irqs, etc, so I ran sudo > strace -cf -p 2466 to get a feel for what's going on: > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 86.84 3489.872073 14442 241646 27003 futex > 10.69 429.437949 271453 1582 epoll_wait > 1.88 75.608000 361761 209 108 restart_syscall > 0.29 11.722287 46889 250 recvfrom > 0.12 4.736911 92 51379 gettimeofday > 0.08 3.173336 12 254162 clock_gettime > 0.06 2.234660 4373 511 poll > ... > > I don't understand whether threads that are simply blocking are counted > (in which case this is a worthless measure) or not. > > I ran jvmtop to get some runtime data out of one of the topologies (well, > a few, but they were all roughly the same): > > TID NAME STATE CPU TOTALCPU > BLOCKEDBY > > 203 Thread-27-$mastercoord-bg0-exe RUNNABLE 57.55% 8.54% > > > 414 RMI TCP Connection(3)-172.17.0 RUNNABLE 8.03% 0.01% > > > 22 disruptor-flush-trigger TIMED_WAITING 3.79% 4.49% > > > 51 heartbeat-timer TIMED_WAITING 0.80% 1.66% > > > 328 disruptor-flush-task-pool TIMED_WAITING 0.61% 0.84% > > ... > > So just about all of the time is spent inside the trident master > coordinator. > > My theory is that the ridiculous thread count is causing the kernel to > work extra-hard on all those futex calls (e.g. when waking a thread > blocking on a futex). > > I'm very uncertain regarding this, the best I can say is that the overhead > is related more to the number of topologies than to what the topology is > doing (when running 1 topology on the same worker, cpu use is less than > 1/10th of what it is with 10 topologies), and there are a *lot* of > threads on the system (>3000). > > Any advice, suggestions for additional diagnostics, or ideas as to the > cause? Operationally, we're planning moving to smaller instances with less > slots per worker to work around this issue, but I'd rather resolve it > without changing our cluster's makeup entirely. > > Thanks, > > Roee >
