Oh geez, checkmarks = checkpoints... sorry. What i mean by stale "checkpoints" are checkpoints that should be reaped by: "state.checkpoints.num-retained: 3".
What is happening is that directories: - state.checkpoints.dir: file:///opt/ha/49/checkpoints - high-availability.storageDir: file:///opt/ha/49/ha are growing with every checkpoint and i'm running out of disk space. On Tue, Sep 24, 2019 at 4:55 AM Biao Liu <mmyy1...@gmail.com> wrote: > Hi Clay, > > Sorry I don't get your point. I'm not sure what the "stale checkmarks" > exactly means. The HA storage and checkpoint directory left after shutting > down cluster? > > Thanks, > Biao /'bɪ.aʊ/ > > > > On Tue, 24 Sep 2019 at 03:12, Clay Teeter <clay.tee...@maalka.com> wrote: > >> I'm trying to get my standalone cluster to remove stale checkmarks. >> >> The cluster is composed of a single job and task manager backed by >> rocksdb with high availability. >> >> The configuration on both the job and task manager are: >> >> state.backend: rocksdb >> state.checkpoints.dir: file:///opt/ha/49/checkpoints >> state.backend.incremental: true >> state.checkpoints.num-retained: 3 >> jobmanager.heap.size: 1024m >> taskmanager.heap.size: 2048m >> taskmanager.numberOfTaskSlots: 24 >> parallelism.default: 1 >> high-availability.jobmanager.port: 6123 >> high-availability.zookeeper.path.root: ********_49 >> high-availability: zookeeper >> high-availability.storageDir: file:///opt/ha/49/ha >> high-availability.zookeeper.quorum: ******t:2181 >> >> Both machines have access to /opt/ha/49 and /opt/ha/49/checkpoints via >> NFS and are owned by the flink user. Also, there are no errors that i can >> find. >> >> Does anyone have any ideas that i could try? >> >>