Hi Stephan, That's good information to know. We will hit that throughput easily. Our computation graph has lot of chaining like this right now. I think it's safe to minimize the chain right now.
Thanks a lot for this Stephan. Cheers On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <se...@apache.org> wrote: > In a set of benchmarks a while back, we found that the chaining mechanism > has some overhead right now, because of its abstraction. The abstraction > creates iterators for each element and makes it hard for the JIT to > specialize on the operators in the chain. > > For purely local chains at full speed, this overhead is observable (can > decrease throughput from 25mio elements/core to 15-20mio elements per > core). If your job does not reach that throughput, or is I/O bound, source > bound, etc, it does not matter. > > If you care about super high performance, collapsing the code into one > function helps. > > On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05...@gmail.com> wrote: > >> Hi Gyula, >> >> Thanks for your response. Seems i will use filter and map for now as that >> one is really make the intention clear, and not a big performance hit. >> >> Thanks again. >> >> Cheers >> >> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.f...@gmail.com> wrote: >> >>> Hey Welly, >>> >>> If you call filter and map one after the other like you mentioned, these >>> operators will be chained and executed as if they were running in the same >>> operator. >>> The only small performance overhead comes from the fact that the output >>> of the filter will be copied before passing it as input to the map to keep >>> immutability guarantees (but no serialization/deserialization will happen). >>> Copying might be practically free depending on your data type, though. >>> >>> If you are using operators that don't make use of the immutability of >>> inputs/outputs (i.e you don't hold references to those values) than you can >>> disable copying altogether by calling env.getConfig().enableObjectReuse(), >>> in which case they will have exactly the same performance. >>> >>> Cheers, >>> Gyula >>> >>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. szept. 3., >>> Cs, 4:33): >>> >>>> Hi All, >>>> >>>> I would like to filter some item from the event stream. I think there >>>> are two ways doing this. >>>> >>>> Using the regular pipeline filter(...).map(...). We can also use >>>> flatMap for doing both in the same operator. >>>> >>>> Any performance improvement if we are using flatMap ? As that will be >>>> done in one operator instance. >>>> >>>> >>>> Cheers >>>> >>>> >>>> -- >>>> Welly Tambunan >>>> Triplelands >>>> >>>> http://weltam.wordpress.com >>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>> >>> >> >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com <http://www.triplelands.com/blog/> >> > > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com <http://www.triplelands.com/blog/>