I see, I'll try turning off incremental checkpoints to see if that helps.

re: Diskspace, i could see a scenario with my application where i could get
10,000+ checkpoints, if the checkpoints are additive.  I'll let you know
what i see.

Thanks!
Clay


On Wed, Sep 25, 2019 at 5:40 PM Fabian Hueske <fhue...@gmail.com> wrote:

> Hi,
>
> You enabled incremental checkpoints.
> This means that parts of older checkpoints that did not change since the
> last checkpoint are not removed because they are still referenced by the
> incremental checkpoints.
> Flink will automatically remove them once they are not needed anymore.
>
> Are you sure that the size of your application's state is not growing too
> large?
>
> Best, Fabian
>
> Am Di., 24. Sept. 2019 um 10:47 Uhr schrieb Clay Teeter <
> clay.tee...@maalka.com>:
>
>> Oh geez,  checkmarks  = checkpoints... sorry.
>>
>> What i mean by stale "checkpoints" are checkpoints that should be reaped
>> by: "state.checkpoints.num-retained: 3".
>>
>> What is happening is that directories:
>>   - state.checkpoints.dir: file:///opt/ha/49/checkpoints
>>   - high-availability.storageDir: file:///opt/ha/49/ha
>> are growing with every checkpoint and i'm running out of disk space.
>>
>> On Tue, Sep 24, 2019 at 4:55 AM Biao Liu <mmyy1...@gmail.com> wrote:
>>
>>> Hi Clay,
>>>
>>> Sorry I don't get your point. I'm not sure what the "stale checkmarks"
>>> exactly means. The HA storage and checkpoint directory left after shutting
>>> down cluster?
>>>
>>> Thanks,
>>> Biao /'bɪ.aʊ/
>>>
>>>
>>>
>>> On Tue, 24 Sep 2019 at 03:12, Clay Teeter <clay.tee...@maalka.com>
>>> wrote:
>>>
>>>> I'm trying to get my standalone cluster to remove stale checkmarks.
>>>>
>>>> The cluster is composed of a single job and task manager backed by
>>>> rocksdb with high availability.
>>>>
>>>> The configuration on both the job and task manager are:
>>>>
>>>> state.backend: rocksdb
>>>> state.checkpoints.dir: file:///opt/ha/49/checkpoints
>>>> state.backend.incremental: true
>>>> state.checkpoints.num-retained: 3
>>>> jobmanager.heap.size: 1024m
>>>> taskmanager.heap.size: 2048m
>>>> taskmanager.numberOfTaskSlots: 24
>>>> parallelism.default: 1
>>>> high-availability.jobmanager.port: 6123
>>>> high-availability.zookeeper.path.root: ********_49
>>>> high-availability: zookeeper
>>>> high-availability.storageDir: file:///opt/ha/49/ha
>>>> high-availability.zookeeper.quorum: ******t:2181
>>>>
>>>> Both machines have access to /opt/ha/49 and /opt/ha/49/checkpoints via
>>>> NFS and are owned by the flink user.  Also, there are no errors that i can
>>>> find.
>>>>
>>>> Does anyone have any ideas that i could try?
>>>>
>>>>

Reply via email to