Hi all, we came across some interesting behaviour today. We enabled object reuse on a streaming job that looks like this:
stream = env.addSource(source) stream.map(mapFnA).addSink(sinkA) stream.map(mapFnB).addSink(sinkB) Operator chaining is enabled, so the optimizer fuses all operations into a single slot. The same object reference gets passed to both mapFnA and mapFnB. This makes sense when I think about the internal implementation, but it still came as a bit of a surprise since the object reuse docs (for batch - there are no official ones for streaming, right?) don't really deal with splitting the DataSet/DataStream. I guess my case is *technically* covered by the documented warning that it is unsafe to reuse an object that has already been collected, only in this case this reuse is "hidden" behind the stream definition DSL. Is this the expected behaviour? Is object reuse for DataStreams encouraged at all or is it more of a "hidden beta" feature until FLIP-21 is officially finished? Best, Urs -- Urs Schönenberger - urs.schoenenber...@tngtech.com TNG Technology Consulting GmbH, Beta-Straße 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller Sitz: Unterföhring * Amtsgericht München * HRB 135082