Hi,

For Flink 1.8 (and 1.9) the only thing that you can do, is to try to limit 
amount of data buffered between the nodes (check Flink network configuration 
[1] for number of buffers and or buffer pool sizes). This can reduce maximal 
throughput (but only if the network transfer is a significant cost, for example 
if your records are extremely quick to process), but it will speed up 
checkpointing during back pressure.

There are some plans to address this and maybe there will be some improvement 
in Flink 1.10. 

If your job is completely stalled because of an outage, then I don’t think that 
you can do much now, since even with only one single buffered record the 
checkpoints will not progress. We might try to address this, but that’s further 
down the road.

Piotrek

[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html 
<https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html>

> On 30 Jul 2019, at 17:50, Mohammad Hosseinian <mohammad.hossein...@id1.de> 
> wrote:
> 
> Hi, 
> 
> I'm still facing the same issue under 1.8. Our pipeline uses end-to-end
> exactly-once semantic, which means the consumer program cannot read the
> messages until they are committed. So in case of an outage, the whole
> runtime delay is passed over to the next stream processor application and
> creates an even larger delay in our processing pipeline. Is there any way to
> force the checkpoint to complete even under backpressure situation? 
> 
> Thank you in advance. 
> 
> Regards, 
> -- 
> Mohammad Hosseinian
> Software Developer
> Information Design One AG
> 
> Phone +49-69-244502-0
> Fax +49-69-244502-10
> Web www.id1.de <http://www.id1.de/>
> 
> 
> Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, 
> Germany
> Registration: Amtsgericht Frankfurt am Main, HRB 52596
> Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: 
> Christian Hecht
> 

Reply via email to