Re: Does flink support retries on checkpoint write failures

2020-01-29 Thread wvl
Forgive my lack of knowledge here - I'm a bit out of my league here. But I was wondering if allowing e.g. 1 checkpoint to fail and the reason for which somehow caused a record to be lost (e.g. rocksdb exception / taskmanager crash / etc), there would be no Source rewind to the last successful chec

Re: Memory constrains running Flink on Kubernetes

2019-08-05 Thread wvl
compaction_style=kCompactionStyleLevel Are these options somehow not applied or overridden? On Mon, Jul 29, 2019 at 4:42 PM wvl wrote: > Excellent. Thanks for all the answers so far. > > So there was another issue I mentioned which we made some progress gaining > insight into, namely our metasp

Re: Memory constrains running Flink on Kubernetes

2019-07-29 Thread wvl
g by default, and even >> if you only one slot per taskmanager, there might exists many RocksDB >> within that TM due to many operator with keyed state running. >> >> Apart from the theoretical analysis, you'd better to open RocksDB native >> metrics or track the memory

Re: Memory constrains running Flink on Kubernetes

2019-07-25 Thread wvl
e-oom-behavior >> [2] >> https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks >> [3] >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#rocksdb-native-metrics >> >> Best >> Yun Tang >>

Memory constrains running Flink on Kubernetes

2019-07-23 Thread wvl
Hi, We're running a relatively simply Flink application that uses a bunch of state in RocksDB on Kubernetes. During the course of development and going to production, we found that we were often running into memory issues made apparent by Kubernetes OOMKilled and Java OOM log events. In order to