Nathan: Where can I find this? "See for example published single machine benchmarks"
Thank you for your time! +++++++++++++++++++++ Jeff Maass <maas...@gmail.com> linkedin.com/in/jeffmaass stackoverflow.com/users/373418/maassql +++++++++++++++++++++ On Tue, May 12, 2015 at 7:57 AM, Nathan Leung <ncle...@gmail.com> wrote: > I'm not very surprised. See for example published single machine > benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan > Marz though that figure is a little old). This is spout to bolt and matches > my observations for trivial cases. With some processing logic and only one > spout I can see how it's lower. > > You can reduce the overhead by batching your work differently, eg by doing > more work in each call to nextTuple. > On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de> > wrote: > >> Can you share your code? >> >> Do you process a single tuple each time nextTuple() is called? If a >> spout does not emit anything, Storm applies a waiting-penalty to avoid >> busy waiting. That might slow down your code. >> >> You can configure the waiting strategy: >> https://storm.apache.org/2012/09/06/storm081-released.html >> >> -Matthias >> >> >> On 05/12/2015 09:31 AM, Daniel Compton wrote: >> > I'm also interested on the answers to this question, but to add to the >> > discussion, take a look at >> > >> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html >> . >> > I suspect Storm is still introducing coordination overhead even running >> > on a single machine. >> > On Tue, 12 May 2015 at 1:39 pm yang...@bupt.edu.cn >> > <mailto:yang...@bupt.edu.cn> <yang...@bupt.edu.cn >> > <mailto:yang...@bupt.edu.cn>> wrote: >> > >> > __ >> > Hi and thanks . >> > >> > I'm working on a parrallel algorithm, which is to count massive >> > items in data streams. The previous researches on the parallelism of >> > this algorithm were focusing on muti-core CPU, however, I want to >> > take advantage of Storm. >> > >> > Processing latency is extremly important for this algorithm, and I >> > did some evaluation of the perfomance. >> > >> > Firstly, I implemented the algorithm in java(one thread, with no >> > parallelism) and I get the performance : it could process 3 million >> > items per second. >> > >> > Secondly, I wrapped this implement of the algorithm into Storm(just >> > one Spout to process) and I get the perfomance: it could process >> > only 0.75 million items per second. I changes a little bit of my >> > impletment to adapt Storm structure, but in the end the perfomance >> > is still not good.... >> > >> > ps. I didn't take the network overhead into consideration because I >> > just run the program in the single Spout node so that there is no >> > emit or transfer.(so I don't care how storm emits messages between >> > nodes for now ) The program on Spout is actually doing the same >> > thing as the former one.(I just copy the program into the >> > NextTuple() method with some necessary changes) >> > >> > 1. The degration(1/4 of the speed) is inevitable? >> > 2. What incurred the degration? >> > 3. How can I reduce the degration? >> > >> > Thank you all. >> > >> > >> ------------------------------------------------------------------------ >> > yang...@bupt.edu.cn <mailto:yang...@bupt.edu.cn> >> > >> >>