Hi Edward, >From this log: Caused by: java.io.EOFException, it seems that the state metadata file has been corrupted. But I can't confirm it, maybe Stefan knows more details, Ping him for you.
Thanks, vino. Edward Rojas <edward.roja...@gmail.com> 于2018年9月7日周五 上午1:22写道: > Hello all, > > We are running Flink 1.5.3 on Kubernetes with RocksDB as statebackend. > When performing some load testing we got an /OutOfMemoryError: native > memory > exhausted/, causing the job to fail and be restarted. > > After the Taskmanager is restarted, the job is recovered from a Checkpoint, > but it seems that there is a problem when trying to access the state. We > got > the error from the *onTimer* function of a *onProcessingTime*. > > It would be possible that the OOM error could have caused to checkpoint a > corrupted state? > > We get Exceptions like: > > TimerException{java.lang.RuntimeException: Error while retrieving data from > RocksDB.} > at > > org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:288) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522) > at java.util.concurrent.FutureTask.run(FutureTask.java:277) > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:191) > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.lang.Thread.run(Thread.java:811) > Caused by: java.lang.RuntimeException: Error while retrieving data from > RocksDB. > at > > org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:89) > at com.xxx.ProcessFunction.*onTimer*(ProcessFunction.java:279) > at > > org.apache.flink.streaming.api.operators.KeyedProcessOperator.invokeUserFunction(KeyedProcessOperator.java:94) > at > > org.apache.flink.streaming.api.operators.KeyedProcessOperator.*onProcessingTime*(KeyedProcessOperator.java:78) > at > > org.apache.flink.streaming.api.operators.HeapInternalTimerService.*onProcessingTime*(HeapInternalTimerService.java:266) > at > > org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:285) > ... 7 more > Caused by: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:208) > at java.io.DataInputStream.readUTF(DataInputStream.java:618) > at java.io.DataInputStream.readUTF(DataInputStream.java:573) > at > > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:381) > at > > org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:87) > ... 12 more > > > Thanks in advance for any help > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >