Re: Flink job manager doesn't remove stale checkmarks

Clay Teeter Tue, 24 Sep 2019 01:48:24 -0700

Oh geez,  checkmarks  = checkpoints... sorry.

What i mean by stale "checkpoints" are checkpoints that should be reaped
by: "state.checkpoints.num-retained: 3".


What is happening is that directories:
  - state.checkpoints.dir: file:///opt/ha/49/checkpoints
  - high-availability.storageDir: file:///opt/ha/49/ha
are growing with every checkpoint and i'm running out of disk space.

On Tue, Sep 24, 2019 at 4:55 AM Biao Liu <mmyy1...@gmail.com> wrote:

> Hi Clay,
>
> Sorry I don't get your point. I'm not sure what the "stale checkmarks"
> exactly means. The HA storage and checkpoint directory left after shutting
> down cluster?
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Tue, 24 Sep 2019 at 03:12, Clay Teeter <clay.tee...@maalka.com> wrote:
>
>> I'm trying to get my standalone cluster to remove stale checkmarks.
>>
>> The cluster is composed of a single job and task manager backed by
>> rocksdb with high availability.
>>
>> The configuration on both the job and task manager are:
>>
>> state.backend: rocksdb
>> state.checkpoints.dir: file:///opt/ha/49/checkpoints
>> state.backend.incremental: true
>> state.checkpoints.num-retained: 3
>> jobmanager.heap.size: 1024m
>> taskmanager.heap.size: 2048m
>> taskmanager.numberOfTaskSlots: 24
>> parallelism.default: 1
>> high-availability.jobmanager.port: 6123
>> high-availability.zookeeper.path.root: ********_49
>> high-availability: zookeeper
>> high-availability.storageDir: file:///opt/ha/49/ha
>> high-availability.zookeeper.quorum: ******t:2181
>>
>> Both machines have access to /opt/ha/49 and /opt/ha/49/checkpoints via
>> NFS and are owned by the flink user.  Also, there are no errors that i can
>> find.
>>
>> Does anyone have any ideas that i could try?
>>
>>

Re: Flink job manager doesn't remove stale checkmarks

Reply via email to