A more practical question is - is there a way to replace the disruptor
waiting strategy? I suspect the issue is the ginormous amount of threads
vying to be woken up, running the disruptor queue with SleepWaitStrategy
seems like it should alleviate this pain, but it seems the ability to set
the wait strategy was removed in the transition from clojure to java in
versions 0.10.x->1.0.x.

On Sun, Jun 11, 2017 at 3:32 PM, Roee Shenberg <[email protected]> wrote:

> Our storm cluster (1.0.2) is running many trident topologies, each one is
> local to a single worker, with each supervisor having 10 worker slots.
> Every slot runs a copy of the same topology with a different configuration,
> the topology being a fairly fat trident topology (e.g. ~300 threads per
> topology - totalling >3000 threads on the machine)
>
> A quick htop showed a grim picture of most CPU time being spent in the
> kernel:
> [image: Inline image 1]
> (note: running as root inside a docker)
>
> Here's an example top summary line:
>
> %Cpu(s):* 39.4 *us,* 51.1 *sy,*  0.0 *ni,*  8.6 *id,*  0.0 *wa,*  0.0 *hi,*
> 0.1 *si,*  0.8 *st
> This suggests actual kernel time waste, not I/O, irqs, etc, so I ran sudo
> strace -cf -p 2466 to get a feel for what's going on:
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  86.84 3489.872073       14442    241646     27003 futex
>  10.69  429.437949      271453      1582           epoll_wait
>   1.88   75.608000      361761       209       108 restart_syscall
>   0.29   11.722287       46889       250           recvfrom
>   0.12    4.736911          92     51379           gettimeofday
>   0.08    3.173336          12    254162           clock_gettime
>   0.06    2.234660        4373       511           poll
> ...
>
> I don't understand whether threads that are simply blocking are counted
> (in which case this is a worthless measure) or not.
>
> I ran jvmtop to get some runtime data out of one of the topologies (well,
> a few, but they were all roughly the same):
>
>   TID   NAME                                    STATE    CPU  TOTALCPU
> BLOCKEDBY
>
>     203 Thread-27-$mastercoord-bg0-exe       RUNNABLE 57.55%     8.54%
>
>
>     414 RMI TCP Connection(3)-172.17.0       RUNNABLE  8.03%     0.01%
>
>
>      22 disruptor-flush-trigger         TIMED_WAITING  3.79%     4.49%
>
>
>      51 heartbeat-timer                 TIMED_WAITING  0.80%     1.66%
>
>
>     328 disruptor-flush-task-pool       TIMED_WAITING  0.61%     0.84%
>
> ...
>
> So just about all of the time is spent inside the trident master
> coordinator.
>
> My theory is that the ridiculous thread count is causing the kernel to
> work extra-hard on all those futex calls (e.g. when waking a thread
> blocking on a futex).
>
> I'm very uncertain regarding this, the best I can say is that the overhead
> is related more to the number of topologies than to what the topology is
> doing (when running 1 topology on the same worker, cpu use is less than
> 1/10th of what it is with 10 topologies), and there are a *lot* of
> threads on the system (>3000).
>
> Any advice, suggestions for additional diagnostics, or ideas as to the
> cause? Operationally, we're planning moving to smaller instances with less
> slots per worker to work around this issue, but I'd rather resolve it
> without changing our cluster's makeup entirely.
>
> Thanks,
>
> Roee
>

Reply via email to