Hi,

Sorry if this was already asked.

For performances reasons (streaming as well as batch) I'd like to "group" 
messages (let's say by batches of 1000) before sending them to my sink (kafka, 
but mainly ES) so that I have a smaller overhead.

I've seen the "countWindow" operation but if I'm not wrong the parallelism of 
such an operation is 1. Moreover I'd need some "timeout" (send the current 
batch to next operator after 5s if it did not reach 1000 messages before that).

I could also create a flatMap "String to List<String>" that cumulates messages 
until it reaches 1000 and then sends them to output, however that does not 
solve the timeout issue (not sure I could call out.collect() from a Timer 
thread), and even more importantly I'm afraid that that would screw up the 
exactly-once policy (flink could not know that I was stacking messages, I could 
very well be filtering them) in case of a crash.

My Sink could also create the chunks, with it's own timer / counter, but I'm 
also afraid that it would bread the exactly-once thingie since in case of crash 
there is no way that flink would know if the message was really sent or stacked 
...

Is there a proper way to do what I want ?

Thanks in advance,

Gwenhaƫl PASQUIERS

Reply via email to