Actually that figure is from a Nathan Marz tweet, but he also cites the million mark here: http://nathanmarz.com/blog/storms-1st-birthday.html
When I saw this type of throughout it was with a canned example that I created solely for testing throughput. Also it was run on pretty beefy hardware so ymmv. On May 13, 2015 9:24 AM, "Jeffery Maass" <maas...@gmail.com> wrote: > Nathan: > > Where can I find this? > "See for example published single machine benchmarks" > > Thank you for your time! > > +++++++++++++++++++++ > Jeff Maass <maas...@gmail.com> > linkedin.com/in/jeffmaass > stackoverflow.com/users/373418/maassql > +++++++++++++++++++++ > > > On Tue, May 12, 2015 at 7:57 AM, Nathan Leung <ncle...@gmail.com> wrote: > >> I'm not very surprised. See for example published single machine >> benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan >> Marz though that figure is a little old). This is spout to bolt and matches >> my observations for trivial cases. With some processing logic and only one >> spout I can see how it's lower. >> >> You can reduce the overhead by batching your work differently, eg by >> doing more work in each call to nextTuple. >> On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de> >> wrote: >> >>> Can you share your code? >>> >>> Do you process a single tuple each time nextTuple() is called? If a >>> spout does not emit anything, Storm applies a waiting-penalty to avoid >>> busy waiting. That might slow down your code. >>> >>> You can configure the waiting strategy: >>> https://storm.apache.org/2012/09/06/storm081-released.html >>> >>> -Matthias >>> >>> >>> On 05/12/2015 09:31 AM, Daniel Compton wrote: >>> > I'm also interested on the answers to this question, but to add to the >>> > discussion, take a look at >>> > >>> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html >>> . >>> > I suspect Storm is still introducing coordination overhead even running >>> > on a single machine. >>> > On Tue, 12 May 2015 at 1:39 pm yang...@bupt.edu.cn >>> > <mailto:yang...@bupt.edu.cn> <yang...@bupt.edu.cn >>> > <mailto:yang...@bupt.edu.cn>> wrote: >>> > >>> > __ >>> > Hi and thanks . >>> > >>> > I'm working on a parrallel algorithm, which is to count massive >>> > items in data streams. The previous researches on the parallelism >>> of >>> > this algorithm were focusing on muti-core CPU, however, I want to >>> > take advantage of Storm. >>> > >>> > Processing latency is extremly important for this algorithm, and I >>> > did some evaluation of the perfomance. >>> > >>> > Firstly, I implemented the algorithm in java(one thread, with no >>> > parallelism) and I get the performance : it could process 3 million >>> > items per second. >>> > >>> > Secondly, I wrapped this implement of the algorithm into >>> Storm(just >>> > one Spout to process) and I get the perfomance: it could process >>> > only 0.75 million items per second. I changes a little bit of my >>> > impletment to adapt Storm structure, but in the end the perfomance >>> > is still not good.... >>> > >>> > ps. I didn't take the network overhead into consideration because I >>> > just run the program in the single Spout node so that there is no >>> > emit or transfer.(so I don't care how storm emits messages between >>> > nodes for now ) The program on Spout is actually doing the same >>> > thing as the former one.(I just copy the program into the >>> > NextTuple() method with some necessary changes) >>> > >>> > 1. The degration(1/4 of the speed) is inevitable? >>> > 2. What incurred the degration? >>> > 3. How can I reduce the degration? >>> > >>> > Thank you all. >>> > >>> > >>> ------------------------------------------------------------------------ >>> > yang...@bupt.edu.cn <mailto:yang...@bupt.edu.cn> >>> > >>> >>> >