I see, I'll try turning off incremental checkpoints to see if that helps. re: Diskspace, i could see a scenario with my application where i could get 10,000+ checkpoints, if the checkpoints are additive. I'll let you know what i see.
Thanks! Clay On Wed, Sep 25, 2019 at 5:40 PM Fabian Hueske <fhue...@gmail.com> wrote: > Hi, > > You enabled incremental checkpoints. > This means that parts of older checkpoints that did not change since the > last checkpoint are not removed because they are still referenced by the > incremental checkpoints. > Flink will automatically remove them once they are not needed anymore. > > Are you sure that the size of your application's state is not growing too > large? > > Best, Fabian > > Am Di., 24. Sept. 2019 um 10:47 Uhr schrieb Clay Teeter < > clay.tee...@maalka.com>: > >> Oh geez, checkmarks = checkpoints... sorry. >> >> What i mean by stale "checkpoints" are checkpoints that should be reaped >> by: "state.checkpoints.num-retained: 3". >> >> What is happening is that directories: >> - state.checkpoints.dir: file:///opt/ha/49/checkpoints >> - high-availability.storageDir: file:///opt/ha/49/ha >> are growing with every checkpoint and i'm running out of disk space. >> >> On Tue, Sep 24, 2019 at 4:55 AM Biao Liu <mmyy1...@gmail.com> wrote: >> >>> Hi Clay, >>> >>> Sorry I don't get your point. I'm not sure what the "stale checkmarks" >>> exactly means. The HA storage and checkpoint directory left after shutting >>> down cluster? >>> >>> Thanks, >>> Biao /'bɪ.aʊ/ >>> >>> >>> >>> On Tue, 24 Sep 2019 at 03:12, Clay Teeter <clay.tee...@maalka.com> >>> wrote: >>> >>>> I'm trying to get my standalone cluster to remove stale checkmarks. >>>> >>>> The cluster is composed of a single job and task manager backed by >>>> rocksdb with high availability. >>>> >>>> The configuration on both the job and task manager are: >>>> >>>> state.backend: rocksdb >>>> state.checkpoints.dir: file:///opt/ha/49/checkpoints >>>> state.backend.incremental: true >>>> state.checkpoints.num-retained: 3 >>>> jobmanager.heap.size: 1024m >>>> taskmanager.heap.size: 2048m >>>> taskmanager.numberOfTaskSlots: 24 >>>> parallelism.default: 1 >>>> high-availability.jobmanager.port: 6123 >>>> high-availability.zookeeper.path.root: ********_49 >>>> high-availability: zookeeper >>>> high-availability.storageDir: file:///opt/ha/49/ha >>>> high-availability.zookeeper.quorum: ******t:2181 >>>> >>>> Both machines have access to /opt/ha/49 and /opt/ha/49/checkpoints via >>>> NFS and are owned by the flink user. Also, there are no errors that i can >>>> find. >>>> >>>> Does anyone have any ideas that i could try? >>>> >>>>