Re: Flink job manager doesn't remove stale checkmarks

Fabian Hueske Wed, 25 Sep 2019 08:47:02 -0700

Hi,

You enabled incremental checkpoints.
This means that parts of older checkpoints that did not change since the
last checkpoint are not removed because they are still referenced by the
incremental checkpoints.
Flink will automatically remove them once they are not needed anymore.


Are you sure that the size of your application's state is not growing too
large?

Best, Fabian

Am Di., 24. Sept. 2019 um 10:47 Uhr schrieb Clay Teeter <
clay.tee...@maalka.com>:

> Oh geez,  checkmarks  = checkpoints... sorry.
>
> What i mean by stale "checkpoints" are checkpoints that should be reaped
> by: "state.checkpoints.num-retained: 3".
>
> What is happening is that directories:
>   - state.checkpoints.dir: file:///opt/ha/49/checkpoints
>   - high-availability.storageDir: file:///opt/ha/49/ha
> are growing with every checkpoint and i'm running out of disk space.
>
> On Tue, Sep 24, 2019 at 4:55 AM Biao Liu <mmyy1...@gmail.com> wrote:
>
>> Hi Clay,
>>
>> Sorry I don't get your point. I'm not sure what the "stale checkmarks"
>> exactly means. The HA storage and checkpoint directory left after shutting
>> down cluster?
>>
>> Thanks,
>> Biao /'bɪ.aʊ/
>>
>>
>>
>> On Tue, 24 Sep 2019 at 03:12, Clay Teeter <clay.tee...@maalka.com> wrote:
>>
>>> I'm trying to get my standalone cluster to remove stale checkmarks.
>>>
>>> The cluster is composed of a single job and task manager backed by
>>> rocksdb with high availability.
>>>
>>> The configuration on both the job and task manager are:
>>>
>>> state.backend: rocksdb
>>> state.checkpoints.dir: file:///opt/ha/49/checkpoints
>>> state.backend.incremental: true
>>> state.checkpoints.num-retained: 3
>>> jobmanager.heap.size: 1024m
>>> taskmanager.heap.size: 2048m
>>> taskmanager.numberOfTaskSlots: 24
>>> parallelism.default: 1
>>> high-availability.jobmanager.port: 6123
>>> high-availability.zookeeper.path.root: ********_49
>>> high-availability: zookeeper
>>> high-availability.storageDir: file:///opt/ha/49/ha
>>> high-availability.zookeeper.quorum: ******t:2181
>>>
>>> Both machines have access to /opt/ha/49 and /opt/ha/49/checkpoints via
>>> NFS and are owned by the flink user.  Also, there are no errors that i can
>>> find.
>>>
>>> Does anyone have any ideas that i could try?
>>>
>>>

Re: Flink job manager doesn't remove stale checkmarks

Reply via email to