We will definitely also try to get the chaining overhead down a bit. BTW: To reach this kind of throughput, you need sources that can produce very fast...
On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan <if05...@gmail.com> wrote: > Hi Stephan, > > That's good information to know. We will hit that throughput easily. Our > computation graph has lot of chaining like this right now. > I think it's safe to minimize the chain right now. > > Thanks a lot for this Stephan. > > Cheers > > On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <se...@apache.org> wrote: > >> In a set of benchmarks a while back, we found that the chaining mechanism >> has some overhead right now, because of its abstraction. The abstraction >> creates iterators for each element and makes it hard for the JIT to >> specialize on the operators in the chain. >> >> For purely local chains at full speed, this overhead is observable (can >> decrease throughput from 25mio elements/core to 15-20mio elements per >> core). If your job does not reach that throughput, or is I/O bound, source >> bound, etc, it does not matter. >> >> If you care about super high performance, collapsing the code into one >> function helps. >> >> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05...@gmail.com> wrote: >> >>> Hi Gyula, >>> >>> Thanks for your response. Seems i will use filter and map for now as >>> that one is really make the intention clear, and not a big performance hit. >>> >>> Thanks again. >>> >>> Cheers >>> >>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.f...@gmail.com> >>> wrote: >>> >>>> Hey Welly, >>>> >>>> If you call filter and map one after the other like you mentioned, >>>> these operators will be chained and executed as if they were running in the >>>> same operator. >>>> The only small performance overhead comes from the fact that the output >>>> of the filter will be copied before passing it as input to the map to keep >>>> immutability guarantees (but no serialization/deserialization will happen). >>>> Copying might be practically free depending on your data type, though. >>>> >>>> If you are using operators that don't make use of the immutability of >>>> inputs/outputs (i.e you don't hold references to those values) than you can >>>> disable copying altogether by calling env.getConfig().enableObjectReuse(), >>>> in which case they will have exactly the same performance. >>>> >>>> Cheers, >>>> Gyula >>>> >>>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. szept. 3., >>>> Cs, 4:33): >>>> >>>>> Hi All, >>>>> >>>>> I would like to filter some item from the event stream. I think there >>>>> are two ways doing this. >>>>> >>>>> Using the regular pipeline filter(...).map(...). We can also use >>>>> flatMap for doing both in the same operator. >>>>> >>>>> Any performance improvement if we are using flatMap ? As that will be >>>>> done in one operator instance. >>>>> >>>>> >>>>> Cheers >>>>> >>>>> >>>>> -- >>>>> Welly Tambunan >>>>> Triplelands >>>>> >>>>> http://weltam.wordpress.com >>>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>>> >>>> >>> >>> >>> -- >>> Welly Tambunan >>> Triplelands >>> >>> http://weltam.wordpress.com >>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>> >> >> > > > -- > Welly Tambunan > Triplelands > > http://weltam.wordpress.com > http://www.triplelands.com <http://www.triplelands.com/blog/> >