Hi all,

we came across some interesting behaviour today.
We enabled object reuse on a streaming job that looks like this:

stream = env.addSource(source)
stream.map(mapFnA).addSink(sinkA)
stream.map(mapFnB).addSink(sinkB)

Operator chaining is enabled, so the optimizer fuses all operations into
a single slot.
The same object reference gets passed to both mapFnA and mapFnB. This
makes sense when I think about the internal implementation, but it still
came as a bit of a surprise since the object reuse docs (for batch -
there are no official ones for streaming, right?) don't really deal with
splitting the DataSet/DataStream. I guess my case is *technically*
covered by the documented warning that it is unsafe to reuse an object
that has already been collected, only in this case this reuse is
"hidden" behind the stream definition DSL.

Is this the expected behaviour? Is object reuse for DataStreams
encouraged at all or is it more of a "hidden beta" feature until FLIP-21
is officially finished?

Best,
Urs

-- 
Urs Schönenberger - urs.schoenenber...@tngtech.com

TNG Technology Consulting GmbH, Beta-Straße 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Reply via email to