RE: Re: How much is the overhead of time to deploy a system on Storm ?

Nathan Leung Thu, 14 May 2015 06:32:26 -0700

Maybe I will make an analogy. Think of spout executors as people wrapping
presents. Think of spout tasks as tables where people can wrap presents.


If you have 10 tasks and 1 executor, then you have 10 tasks and 1 person.
The person will wrap a present at one table, then go to the next, wrap a
present, etc. If you have 10 tasks and 10 executors then you have 1 person
at each table.

Adding spout tasks to handle i/o blocking will not help unless you use
asynchronous i/o from multiple sources. Personally I find it easier to
understand more executors that are blocked synchronously.
On May 14, 2015 9:25 AM, <rajesh_kall...@dellteam.com> wrote:

> *Dell - Internal Use - Confidential *
>
> Nathan,
>
>
>
> Can you explain in a little more detail what you mean by *“When you have
> more tasks than executors, the spout thread does the same logic, it just
> does it for more tasks during its main loop.” * I thought the spout
> thread emits tuples based on the max spout pending and how quickly the
> downstream bolts are processing the incoming tuples.
>
>
>
> +1 for setting number of tasks of a bolt to a higher number so that you
> can rebalance later on based on the need.
>
>
>
> The other time I would consider having more than 1 task per executor
> thread  is when the task is IO intensive and you are waiting on the
> response coming back most of the time instead of being CPU intensive.
>
>
>
>
>
> *From:* Nathan Leung [mailto:ncle...@gmail.com]
> *Sent:* Thursday, May 14, 2015 8:05 AM
> *To:* yang...@bupt.edu.cn
> *Cc:* user
> *Subject:* Re: Re: How much is the overhead of time to deploy a system on
> Storm ?
>
>
>
> I would expect that it depends on how many executors you have. In storm,
> an executor corresponds to an OS thread while a task is more of a logical
> unit of work. The only situation where I would personally use more tasks
> than executors is if I wanted to over provision the tasks so that I can
> rebalance to use more executors in the future (you cannot change number of
> tasks in rebalance).
>
> When you have more tasks than executors, the spout thread does the same
> logic, it just does it for more tasks during its main loop. I'm not sure
> why that would increase your per thread throughput.
>
> On May 13, 2015 10:13 PM, "yang...@bupt.edu.cn" <yang...@bupt.edu.cn>
> wrote:
>
> hi,Nathan
>
> actually I tried many ways to make my program fit the Storm
>
> 1: a 'while(true)' in the nextTuple()
>
> 2: execute n times in one nextTuple.
>
> I don't need to batch messages because what I really care is the speed it
> processes(emit phase is not the bottleneck).
>
> I want to mention this: I only created one single spout task in one
> machine node.
>
> and I read some papers about storm evaluation, they did some parrallelism
> to some extent. So I tried to add some parrallelism(10tasks per executor
> per node),and I got a pretty good result(the same throughout with the java
> program).
>
> I wonder if this is the design pattern we should pick in storm?
>
>
> ------------------------------
>
> yang...@bupt.edu.cn
>
>
>
> *From:* Nathan Leung <ncle...@gmail.com>
>
> *Date:* 2015-05-12 20:57
>
> *To:* user <user@storm.apache.org>
>
> *Subject:* Re: How much is the overhead of time to deploy a system on
> Storm ?
>
> I'm not very surprised. See for example published single machine
> benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan
> Marz though that figure is a little old). This is spout to bolt and matches
> my observations for trivial cases. With some processing logic and only one
> spout I can see how it's lower.
>
> You can reduce the overhead by batching your work differently, eg by doing
> more work in each call to nextTuple.
>
> On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
> wrote:
>
> Can you share your code?
>
> Do you process a single tuple each time nextTuple() is called? If a
> spout does not emit anything, Storm applies a waiting-penalty to avoid
> busy waiting. That might slow down your code.
>
> You can configure the waiting strategy:
> https://storm.apache.org/2012/09/06/storm081-released.html
>
> -Matthias
>
>
> On 05/12/2015 09:31 AM, Daniel Compton wrote:
> > I'm also interested on the answers to this question, but to add to the
> > discussion, take a look at
> >
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
> .
> > I suspect Storm is still introducing coordination overhead even running
> > on a single machine.
> > On Tue, 12 May 2015 at 1:39 pm yang...@bupt.edu.cn
> > <mailto:yang...@bupt.edu.cn> <yang...@bupt.edu.cn
> > <mailto:yang...@bupt.edu.cn>> wrote:
> >
> >     __
> >     Hi and thanks .
> >
> >     I'm working on a parrallel algorithm, which is to count massive
> >     items in data streams. The previous researches on the parallelism of
> >     this algorithm were focusing on muti-core CPU, however, I want to
> >     take advantage of Storm.
> >
> >     Processing latency is extremly important for this algorithm, and I
> >     did some evaluation of the perfomance.
> >
> >     Firstly,  I implemented the algorithm in java(one thread, with no
> >     parallelism) and I get the performance : it could process 3 million
> >     items per second.
> >
> >     Secondly,  I wrapped this implement of the algorithm into Storm(just
> >     one Spout to process) and I get the perfomance: it could process
> >     only 0.75 million items per second. I changes a little bit of my
> >     impletment to adapt Storm structure, but in the end the perfomance
> >     is still not good....
> >
> >     ps. I didn't take the network overhead into consideration because I
> >     just run the program in the single Spout node so that there is no
> >     emit or transfer.(so I don't care how storm emits messages between
> >     nodes for now ) The program on Spout is actually doing the same
> >     thing as the former one.(I just copy the program into the
> >     NextTuple() method with some necessary changes)
> >
> >     1. The degration(1/4 of the speed) is inevitable?
> >     2. What incurred the degration?
> >     3. How can I reduce the degration?
> >
> >     Thank you all.
> >
> >
>  ------------------------------------------------------------------------
> >     yang...@bupt.edu.cn <mailto:yang...@bupt.edu.cn>
> >
>

RE: Re: How much is the overhead of time to deploy a system on Storm ?

Reply via email to