Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-09-04 Thread Stefan Richter
Hi Jared, I just wanted to follow up on this problem that you reported. Are there any new insights about this problem from your debugging efforts and does it still exists for you? Best, Stefan > Am 09.07.2017 um 18:37 schrieb Jared Stehler > : > > We are

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-07-09 Thread Jared Stehler
We are using the rocksDB state backend. We had not activated incremental checkpointing, but in the course of debugging this, we ended up doing so, and also moving back to S3 from EFS as it appeared that EFS was introducing large latencies. I will attempt to provide some profiler data as we are

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-07-03 Thread Ufuk Celebi
On Mon, Jul 3, 2017 at 12:02 PM, Stefan Richter wrote: > Another thing that could be really helpful, if possible, can you attach a > profiler/sampling to your job manager and figure out the hotspot methods > where most time is spend? This would be very helpful as a

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-07-03 Thread Stefan Richter
Hi, I have two quick questions about this problem report: 1) Which state backend are you using? 2) In case you are using RocksDB, did you also activate incremental checkpointing when moving to Flink 1.3. Another thing that could be really helpful, if possible, can you attach a

External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-06-29 Thread Jared Stehler
We’re seeing our external checkpoints directory grow in an unbounded fashion… after upgrading to Flink 1.3. We are using Flink-Mesos. In 1.2 (HA standalone mode), we saw (correctly) that only the latest external checkpoint was being retained (i.e., respecting state.checkpoints.num-retained