Re: Job requiring a lot of memory despite using rocksdb state backend

Chesnay Schepler Mon, 07 Feb 2022 00:10:43 -0800

What where the errors you received that caused you to increase theamount of memory you provided to RocksDB?


On 05/02/2022 07:12, Salva Alcántara wrote:

I have a job running on Flink 1.14.3 (Java 11) that uses rocksdb asthe state backend. The problem is that the job requires an amount ofmemory pretty similar to the overall state size.
Indeed, for making it stable (and capable of taking snapshots) this iswhat I'm using:
- 4 TMs with 30 GB of RAM and 7 CPUs
- Everything is run on top of Kubernetes on AWS using nodes with 32 GBof RAM and locally attached SSD disks (M5ad instances for what it's worth)
I have these settings in place:

```
state.backend: rocksdb
state.backend.incremental: 'true'
state.backend.rocksdb.localdir: /opt/flink/rocksdb <-- SSD volume (seebelow)
state.backend.rocksdb.memory.managed: 'true'
state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED
taskmanager.memory.managed.fraction: '0.9'
taskmanager.memory.framework.off-heap.size: 512mb
taskmanager.numberOfTaskSlots: '4' (parallelism: 16)
```

Also this:

```
  - name: rocksdb-volume
    volume:
      emptyDir:
        sizeLimit: 100Gi
      name: rocksdb-volume
    volumeMount:
      mountPath: /opt/flink/rocksdb
```
Which provides plenty of disk space for each task manager. With thosesettings, the job runs smoothly and in particular there is arelatively big memory margin. Problem is that memory consumptionslowly increases and also that with less memory margin snapshots fail.I have tried reducing the number of taskmanagers but I need 4. Samewith the amount of RAM, I have tried giving e.g. 16 GB instead of 30GB but same problem. Another setting that has worked for us is using 8TMs each with 16 GB of RAM, but again, this leads to the same amountof memory overall as the current settings. Even with that amount ofmemory, I can see that memory keeps growing and will probably lead toa bad end...
Also, the latest snapshot took around 120 GBs, so as you can see I amusing an amount of RAM similar to the size of the total state, whichdefeats the whole purpose of using a disk-based state backend(rocksdb) plus local SSDs.
Is there an effective way of limiting the memory that rocksdb takes(to that available on the running pods)? Nothing I have found/triedout so far has worked. Theoretically, the images I am using have`jemalloc` in place for memory allocation, which should avoid memoryfragmentation issues observed with malloc in the past.
PS: I posted the issue in SO too:https://stackoverflow.com/questions/70986020/flink-job-requiring-a-lot-of-memory-despite-using-rocksdb-state-backend

Re: Job requiring a lot of memory despite using rocksdb state backend

Reply via email to