Interesting that the paper was written by IBM people defending an IBM
product. Not saying that it's biased or anything...

Nathan, I agree that the windowing is better served as a layer on top.
Personally, I appreciate that Storm deals with clustering, distributed
state, fault-tolerance, and threading so that all I have to think about is
processing the tuples received in a bolt (and I don't even have to worry
about my algorithms being thread safe). This is not the case in InfoSphere
Streams.

I also agree that the windowing is better served as an abstraction layer on
top of a generic streams processing platform. I also appreciate this about
Storm- I'm not limited to a single language/API for processing my streams.

Myself and former coworkers of mine were integrating java with InfoSphere
streams a couple years ago and actually found it to be faster in many cases
than the C++ counterparts. I will say InfoSphere's windowing abstractions
are very well thought out. In fact, I've been working on trying to provide
a similar solution here: https://github.com/calrissian/flowmix largely
based on their design. It's still largely experimental but it is holding up
well on the clusters on which it is deployed. As Nathan put it, It's been
holding up well enough for my use cases.




On Mon, May 12, 2014 at 11:28 AM, Nathan Leung <ncle...@gmail.com> wrote:

> a couple thoughts
>
> 1) IBM streams is certainly more mature, as it's been in development for a
> longer amount of time and storm is not even at release 1.0 yet.  Though I
> am not familiar with SPL, It would also make sense that it's faster to
> implement as it is a higher level abstraction.
>
> 2) Operator fusion will allow more efficiency in passing data between
> steps in your flow, as localOrShuffleGrouping will still need to go over
> disruptor whereas operator fusion from what I understand basically passes
> the pointer directly.  As fast as disruptor is (I've seen benchmarks of
> millions of messages passed / s), it won't be directly passing data to the
> next step (cost: a few instructions).  The downside of this is your flow
> always needs to be created and compiled before you can execute it.
>  Something like a rebalance will require a recompile of your stream.
>  Building a topology dynamically (which is possible in storm, but not a
> feature that is really exposed out of the box) is possible in storm, but
> not in IBM streams.
>
> 3) they took 1 month to optimize storm but I suspect some of this work was
> unnecessary.  Python?  For a benchmark?  Also, uniform message distribution
> by size feels like a premature optimization.  I can understand that they
> would want to explore all avenues to account for a performance difference,
> but in many (most?) practical cases this would not be necessary.  I can
> sympathize on other points.  Tuning the message buffers of storm requires
> pretty specific understanding of the system.  Also if you run out of heap
> and/or have to tune GC, then... yeah.  Not fun.  This would be true for any
> java app though.
>
> 4) I'm not sure they really took language differences seriously enough.
>  I've written certain algorithms in Java that (based on similar algorithms
> that I implemented separately in C++) I would suspect are close to an order
> of magnitude slower just because I ran them in Java.  While I haven't dug
> into this deeply (for example by using an identical algorithm for both Java
> and C++), consider a HashMap indexed by a primitive type.  In Java, these
> are separate objects stored in an array of references.  In C++ these are
> stored sequentially in an array.  C++ allows direct key access in the array
> (as opposed to going through the reference), and is also potentially much
> friendlier with the cache.  Just because the JVM is healthy does not mean
> it's going to perform like C++ for all applications.  I suppose you could
> then argue that for best performance Storm is more or less limited to the
> JVM, but I choose not to consider that point here for brevity.  Note this
> is not to say that it's impossible to write fast code in Java (see
> previously mentioned disruptor).  I would just argue that it's a good bit
> harder.
>
> 5) I'm not sure I buy their argument that application logic costs are
> unlikely to mask the differences in framework performance.  This depends
> very heavily on your application.  If you're hitting external data sources
> a lot (e.g. memcache or database) then that will certainly mask a good
> portion of the difference.  Maybe part of this argument is a C++ vs Java
> difference, in which case I'm somewhat more inclined to agree.
>
> 6) From a business perspective, the question changes from "is it faster?"
> to "what does it cost to support the throughput that we need?" which is a
> very different question.  In many cases storm performs well enough.
>
>
> On Mon, May 12, 2014 at 9:02 AM, John Welcher <jpwelc...@gmail.com> wrote:
>
>> Hi
>>
>> Streams also cost 40,000 US while Storm is free.
>>
>> John
>>
>>
>> On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho <
>> klaus.schaef...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I found some interesting comparison of IBM Stream and Storm:
>>>
>>> https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/
>>>
>>> It also includes an interesting comparison between ZeroMQ and the Netty
>>> Performance.
>>>
>>>
>>> Cheers,
>>>
>>> Klaus
>>>
>>
>>
>

Reply via email to