Hi Kristoff,

synchronization across operators is not easy to achieve.

If one needs to wait until a sink has processed some element, it requires that a sink participates in the pipeline. So it is not located as a "leaf" operator but location somewhere in the middle.

So your idea to call MQ directly in processFunction1 sounds like a reasonable solution to me. Maybe it is possible to wrap the original code somehow. It could require to go one level deeper in the DataStream API (using a custom stream transformation and operator instead of ProcessFunction).

Another idea that comes to my mind is that you use the checkpoint barrier as a synchronization tool. I'm not familiar how the MQ sink works, but if you can ensure that the side output is written out in the next checkpoint. You could leverage an interface like `org.apache.flink.runtime.state.CheckpointListener`.

I hope others might come up with a better idea.

Regards,
Timo

On 14.04.20 23:59, KristoffSC wrote:
Hi all,
I have a special use case that I'm not sure how I can fulfill.

The use case is:
I have my main business processing pipe line that has a MQ source,
processFunction1, processFunction2  and MQ sink

PocessFunction1 apart from processing the main business message is also
emitting some side effects using side outputs. Those side outputs are send
to SideOutputMqSink that sends them to the queue.

The requirement is that PocessFunction1 must not send out the main business
message further to processFunction2 until side output from processFunction1
is send to the queue via SideOutputMqSink.

In general I don't have to use side outputs, although I do some extra
processing on them before sending to the sink so having sideOutput stream is
nice to have. Never the less, the key requirement is that we should wait
with further processing until side utput is send to the queue.

I could achieve it in a way that my processFunction1 in processElement
method will call MQ directly before sending out the main message, although I
dot like that idea.

I was thinking is there a way to have a Sink function that would be also a
FlatMap function?

The best solution would be to be able to process two streams (main and side
effect) in some nice way but with some barrier, so the main pipeline will
wait until side output is send.
Both streams can be keyed.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reply via email to