RocksDB segfault on state restore

Gyula Fóra Wed, 17 May 2023 04:05:37 -0700

Hi All!

We are encountering an error on a larger stateful job (around 1 TB + state)
on restore from a rocksdb checkpoint. The taskmanagers keep crashing with a
segfault coming from the rocksdb native logic and seem to be related to the
FlinkCompactionFilter mechanism.


The gist with the full error report:  report:
https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25

The core part is here:
V  [libjvm.so+0x79478f]  Exceptions::
(Thread*, char const*, int, oopDesc*)+0x15f
V  [libjvm.so+0x960a68]  jni_Throw+0x88
C  [librocksdbjni-linux64.so+0x222aa1]
 JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long,
long) const+0x121
C  [librocksdbjni-linux64.so+0x6486c1]
 rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&,
std::string*) const+0x81
C  [librocksdbjni-linux64.so+0x648bea]
 rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice
const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&,
std::string*, std::string*) const+0x14a

Has anyone encountered a similar issue before?

Thanks
Gyula

RocksDB segfault on state restore

Reply via email to