Maybe I will make an analogy. Think of spout executors as people wrapping presents. Think of spout tasks as tables where people can wrap presents.
If you have 10 tasks and 1 executor, then you have 10 tasks and 1 person. The person will wrap a present at one table, then go to the next, wrap a present, etc. If you have 10 tasks and 10 executors then you have 1 person at each table. Adding spout tasks to handle i/o blocking will not help unless you use asynchronous i/o from multiple sources. Personally I find it easier to understand more executors that are blocked synchronously. On May 14, 2015 9:25 AM, <rajesh_kall...@dellteam.com> wrote: > *Dell - Internal Use - Confidential * > > Nathan, > > > > Can you explain in a little more detail what you mean by *“When you have > more tasks than executors, the spout thread does the same logic, it just > does it for more tasks during its main loop.” * I thought the spout > thread emits tuples based on the max spout pending and how quickly the > downstream bolts are processing the incoming tuples. > > > > +1 for setting number of tasks of a bolt to a higher number so that you > can rebalance later on based on the need. > > > > The other time I would consider having more than 1 task per executor > thread is when the task is IO intensive and you are waiting on the > response coming back most of the time instead of being CPU intensive. > > > > > > *From:* Nathan Leung [mailto:ncle...@gmail.com] > *Sent:* Thursday, May 14, 2015 8:05 AM > *To:* yang...@bupt.edu.cn > *Cc:* user > *Subject:* Re: Re: How much is the overhead of time to deploy a system on > Storm ? > > > > I would expect that it depends on how many executors you have. In storm, > an executor corresponds to an OS thread while a task is more of a logical > unit of work. The only situation where I would personally use more tasks > than executors is if I wanted to over provision the tasks so that I can > rebalance to use more executors in the future (you cannot change number of > tasks in rebalance). > > When you have more tasks than executors, the spout thread does the same > logic, it just does it for more tasks during its main loop. I'm not sure > why that would increase your per thread throughput. > > On May 13, 2015 10:13 PM, "yang...@bupt.edu.cn" <yang...@bupt.edu.cn> > wrote: > > hi,Nathan > > actually I tried many ways to make my program fit the Storm > > 1: a 'while(true)' in the nextTuple() > > 2: execute n times in one nextTuple. > > I don't need to batch messages because what I really care is the speed it > processes(emit phase is not the bottleneck). > > I want to mention this: I only created one single spout task in one > machine node. > > and I read some papers about storm evaluation, they did some parrallelism > to some extent. So I tried to add some parrallelism(10tasks per executor > per node),and I got a pretty good result(the same throughout with the java > program). > > I wonder if this is the design pattern we should pick in storm? > > > ------------------------------ > > yang...@bupt.edu.cn > > > > *From:* Nathan Leung <ncle...@gmail.com> > > *Date:* 2015-05-12 20:57 > > *To:* user <user@storm.apache.org> > > *Subject:* Re: How much is the overhead of time to deploy a system on > Storm ? > > I'm not very surprised. See for example published single machine > benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan > Marz though that figure is a little old). This is spout to bolt and matches > my observations for trivial cases. With some processing logic and only one > spout I can see how it's lower. > > You can reduce the overhead by batching your work differently, eg by doing > more work in each call to nextTuple. > > On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de> > wrote: > > Can you share your code? > > Do you process a single tuple each time nextTuple() is called? If a > spout does not emit anything, Storm applies a waiting-penalty to avoid > busy waiting. That might slow down your code. > > You can configure the waiting strategy: > https://storm.apache.org/2012/09/06/storm081-released.html > > -Matthias > > > On 05/12/2015 09:31 AM, Daniel Compton wrote: > > I'm also interested on the answers to this question, but to add to the > > discussion, take a look at > > > http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html > . > > I suspect Storm is still introducing coordination overhead even running > > on a single machine. > > On Tue, 12 May 2015 at 1:39 pm yang...@bupt.edu.cn > > <mailto:yang...@bupt.edu.cn> <yang...@bupt.edu.cn > > <mailto:yang...@bupt.edu.cn>> wrote: > > > > __ > > Hi and thanks . > > > > I'm working on a parrallel algorithm, which is to count massive > > items in data streams. The previous researches on the parallelism of > > this algorithm were focusing on muti-core CPU, however, I want to > > take advantage of Storm. > > > > Processing latency is extremly important for this algorithm, and I > > did some evaluation of the perfomance. > > > > Firstly, I implemented the algorithm in java(one thread, with no > > parallelism) and I get the performance : it could process 3 million > > items per second. > > > > Secondly, I wrapped this implement of the algorithm into Storm(just > > one Spout to process) and I get the perfomance: it could process > > only 0.75 million items per second. I changes a little bit of my > > impletment to adapt Storm structure, but in the end the perfomance > > is still not good.... > > > > ps. I didn't take the network overhead into consideration because I > > just run the program in the single Spout node so that there is no > > emit or transfer.(so I don't care how storm emits messages between > > nodes for now ) The program on Spout is actually doing the same > > thing as the former one.(I just copy the program into the > > NextTuple() method with some necessary changes) > > > > 1. The degration(1/4 of the speed) is inevitable? > > 2. What incurred the degration? > > 3. How can I reduce the degration? > > > > Thank you all. > > > > > ------------------------------------------------------------------------ > > yang...@bupt.edu.cn <mailto:yang...@bupt.edu.cn> > > >