IEEE Xplore Document - Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming
| | | IEEE Xplore Document - Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming Streaming data processing has been gaining attention due to its application into a wide range of scenarios. To s... | | | Benchmarking Streaming Computation Engines at Yahoo! | | | | | | | | | | | Benchmarking Streaming Computation Engines at Yahoo! (Yahoo Storm Team in alphabetical order) Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Tom Graves,... | | | | On Friday, November 4, 2016 6:05 AM, Dominik Safaric <dominiksafa...@gmail.com> wrote: 1- What do you mean "able to control message size"? Is it max-pending-spout parameter? By using for example Kafka as your source of information of the benchmark topology, you may produce i.e. control the size of messages in terms of bytes length. Why would you want to do this? Because there is a relation between certain performance characteristics such as throughput and message size. Is there any published benchmark like this old-one here: As far up to my knowledge, no. However, we at the Web Information Systems research group of the Delft University of Technology are currently in the process of benchmarking several streaming engines (including Storm) part of an empirical research. If you’d like to here more about the insight so far gathered, feel free to email me. On 4 Nov 2016, at 10:02, Walid Aljoby <walid_alj...@yahoo.com> wrote: Thank you Dominik. I have two more points, please.1- What do you mean "able to control message size"? Is it max-pending-spout parameter?2- Is there any published benchmark like this old-one here: https://github.com/stormprocessor/storm-benchmark/commit/22bd17a81020ceef71ed73168ac89d3f8eaf61e2 Best Regards,Walid From: Dominik Safaric <dominiksafa...@gmail.com> To: Walid Aljoby <walid_alj...@yahoo.com> Cc: "user@storm.apache.org" <user@storm.apache.org>; "d...@storm.apache.org" <d...@storm.apache.org> Sent: Friday, November 4, 2016 4:53 PM Subject: Re: Storm benchmarks Well, this depends onto the aspects of the measurements. You may for example define a topology consisting of a spout, transformation bolt and sink that receives byte arrays from Kafka, transforms them and outputs. The nice thing is that you’d be able to control for the size of the messages. In addition, if you care about the performance in conjunction to stateful operations such as aggregations, your topology might look alike the for example WordCount topology. Regards,Dominik On 4 Nov 2016, at 09:50, Walid Aljoby <walid_alj...@yahoo.com> wrote: Hi Dominik, Many thanks for details. Actually I am looking for a set typologies for my test. Thank you again,--RegardsWalid From: Dominik Safaric <dominiksafa...@gmail.com> To: user@storm.apache.org; Walid Aljoby <walid_alj...@yahoo.com> Cc: "d...@storm.apache.org" <d...@storm.apache.org> Sent: Friday, November 4, 2016 4:41 PM Subject: Re: Storm benchmarks Hi Walid, You may benchmark Storm’s performance in terms of throughput and end-to-end latency for example. In addition, the investigation could also include variances in the configurational settings, such as the parallelism, message size, intra-worker and inter-worker buffer size which some of them have a profound effect onto the performance of Storm. There are already a few benchmarks of Storm’s performance such as: https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf In addition, you may want to take a look at the academic paper Storm@Twitter and Twitter Heron: Stream processing at scale which describe among others certain performance aspects of Storm that might be helpful to you when designing the benchmark. Regards,Dominik On 4 Nov 2016, at 09:36, Walid Aljoby <walid_alj...@yahoo.com> wrote: Hi Everyone, Anyone please could tell what are the common benchmarks for testing Storm performance? Thank you,--Regards WA