Re: Tasks are not starting

Nick R. Katsipoulakis Wed, 02 Sep 2015 15:20:48 -0700

When I say co-locate, what I have seen in my experiments is the following:

If the executor's number can be served by workers on one node, the
scheduler spawns all the executors in the workers of one node. I have also
seen that behavior in that the default scheduler tries to fill up one node
before provisioning an additional one for the topology.


Going back to your following sentence "and the executors should be evenly
distributed over all available workers." I have to say that I do not see
that often in my experiments. Actually, I often come across with workers
handling 2 - 3 executors/tasks, and other doing nothing. Am I missing
something? Is it just a coincidence that happened in my experiments?

Thank you,
Nick



2015-09-02 17:38 GMT-04:00 Matthias J. Sax <[email protected]>:

> I agree. The load is not high.
>
> About higher latencies. How many ackers did you configure? As a rule of
> thumb there should be one acker per executor. If you have less ackers,
> and an increasing number of executors, this might cause the increased
> latency as the ackers could become a bottleneck.
>
> What do you mean by "trying to co-locate tasks and executors as much as
> possible"? Tasks a logical units of works that are processed by
> executors (which are threads). Furthermore (as far as I know), the
> default scheduler does a evenly distributed assignment for tasks and
> executor to the available workers. In you case, as you set the number of
> task equal to the number of executors, each executors processes a single
> task, and the executors should be evenly distributed over all available
> workers.
>
> However, you are right: intra-worker channels are "cheaper" than
> inter-worker channels. In order to exploit this, you should use
> shuffle-or-local grouping instead of shuffle. The disadvantage of
> shuffle-or-local might be missing load-balancing. Shuffle always ensures
> good load balancing.
>
>
> -Matthias
>
>
>
> On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
> > Well, my input load is 4 streams at 4000 tuples per second, and each
> > tuple is about 128 bytes long. Therefore, I do not think my load is too
> > much for my hardware.
> >
> > No, I am running only this topology in my cluster.
> >
> > For some reason, when I set the task to executor ratio to 1, my topology
> > does not hang at all. The strange thing now is that I see higher latency
> > with more executors and I am trying to figure this out. Also, I see that
> > the default scheduler is trying to co-locate tasks and executors as much
> > as possible. Is this true? If yes, is it because the intra-worker
> > latencies are much lower than the inter-worker latencies?
> >
> > Thanks,
> > Nick
> >
> > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <[email protected]
> > <mailto:[email protected]>>:
> >
> >     So (for each node) you have 4 cores available for 1 supervisor JVM, 2
> >     worker JVMs that execute up to 5 thread each (if 40 executors are
> >     distributed evenly over all workers. Thus, about 12 threads for 4
> cores.
> >     Or course, Storm starts a few more threads within each
> >     worker/supervisor.
> >
> >     If your load is not huge, this might be sufficient. However, having
> high
> >     data rate, it might be problematic.
> >
> >     One more question: do you run a single topology in your cluster or
> >     multiple? Storm isolates topologies for fault-tolerance reasons.
> Thus, a
> >     single worker cannot process executors from different topologies. If
> you
> >     run out of workers, a topology might not start up completely.
> >
> >     -Matthias
> >
> >
> >
> >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
> >     > Hello Matthias and thank you for your reply. See my answers below:
> >     >
> >     > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge
> instances
> >     > (4 cores per node). On top of that I have 3 more nodes for
> zookeeper and
> >     > nimbus.
> >     > - 2 worker nodes per supervisor node
> >     > - The task number for each bolt ranges from 1 to 4 and I use 1:1
> task to
> >     > executor assignment.
> >     > - The number of executors in total for the topology ranges from 14
> to 41
> >     >
> >     > Thanks,
> >     > Nick
> >     >
> >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <[email protected]
> <mailto:[email protected]>
> >     > <mailto:[email protected] <mailto:[email protected]>>>:
> >     >
> >     >     Without any exception/error message it is hard to tell.
> >     >
> >     >     What is your cluster setup
> >     >       - Hardware, ie, number of cores per node?
> >     >       - How many node/supervisor are available?
> >     >       - Configured number of workers for the topology?
> >     >       - What is the number of task for each spout/bolt?
> >     >       - What is the number of executors for each spout/bolt?
> >     >
> >     >     -Matthias
> >     >
> >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
> >     >     > Hello all,
> >     >     >
> >     >     > I am working on a project in which I submit a topology to my
> >     Storm
> >     >     > cluster, but for some reason, some of my tasks do not start
> >     executing.
> >     >     >
> >     >     > I can see that the above is happening because every bolt I
> have
> >     >     needs to
> >     >     > connect to an external server and do a registration to a
> >     service.
> >     >     > However, some of the bolts do not seem to connect.
> >     >     >
> >     >     > I have to say that the number of tasks I have is larger than
> the
> >     >     number
> >     >     > of workers of my cluster. Also, I check my worker log files,
> >     and I see
> >     >     > that the workers that do not register, are also not writing
> some
> >     >     > initialization messages I have them print in the beginning.
> >     >     >
> >     >     > Any idea why this is happening? Can it be because my
> >     resources are not
> >     >     > enough to start off all of the tasks?
> >     >     >
> >     >     > Thank you,
> >     >     > Nick
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Nikolaos Romanos Katsipoulakis,
> >     > University of Pittsburgh, PhD candidate
> >
> >
> >
> >
> > --
> > Nikolaos Romanos Katsipoulakis,
> > University of Pittsburgh, PhD candidate
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Reply via email to