Hello,
Suddenly our job (Flink 1.12.2) started to failing to restore from savepoint
due to unexpected key group, it looks like stack is for memory state backend,
while in fact actually in flink-conf.yaml state.backend: rocksdb
{"ts":"2021-05-11T15:44:09.004Z","message":"Loading configuration property:
state.backend,
rocksdb","logger_name":"org.apache.flink.configuration.GlobalConfiguration","thread_name":"main","level":"INFO","level_value":20000}
...
{"ts":"2021-05-11T15:45:29.440Z","message":"Xform Voice Ux (1/3)
(8b539f9d1794cd3bcedecaa2082cecf2) switched from RUNNING to FAILED on
10.204.2.162:6122-f7a908 @ gsp-tm-0.gsp-headless.gsp.svc.cluster.local
(dataPort=37741).","logger_name":"org.apache.flink.runtime.executiongraph.ExecutionGraph","thread_name":"flink-akka.actor.default-dispatcher-2","level":"INFO","level_value":20000,"stack_trace":"java.lang.Exception:
Exception while creating StreamOperatorStateContext.\n\tat
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)\n\tat
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)\n\tat
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:427)\n\tat
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:543)\n\tat
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)\n\tat
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:533)\n\tat
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)\n\tat
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)\n\tat
org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)\n\tat
java.lang.Thread.run(Thread.java:748)\nCaused by:
org.apache.flink.util.FlinkException: Could not restore keyed state backend for
KeyedProcessOperator_6c9209d0aee06b8c3f8bb113696c4bb2_(1/3) from any of the 1
provided restore options.\n\tat
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)\n\tat
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)\n\tat
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)\n\t...
9 common frames omitted\nCaused by:
org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to
restore heap backend\n\tat
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:115)\n\tat
org.apache.flink.runtime.state.memory.MemoryStateBackend.createKeyedStateBackend(MemoryStateBackend.java:357)\n\tat
org.apache.flink.runtime.state.memory.MemoryStateBackend.createKeyedStateBackend(MemoryStateBackend.java:104)\n\tat
org.apache.flink.runtime.state.StateBackend.createKeyedStateBackend(StateBackend.java:181)\n\tat
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)\n\tat
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)\n\tat
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)\n\t...
11 common frames omitted\nCaused by: java.lang.IllegalStateException:
Unexpected key-group in restore.\n\tat
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)\n\tat
org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:279)\n\tat
org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:172)\n\tat
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:112)\n\t...
17 common frames omitted\n"}
Thanks,
Alexey