Hello.
We are running Flink 1.20.1 on Kubernetes (AKS). We have observed a consistent
error situation: both checkpoints and savepoints only save “_metadata” file and
nothing else. Sometimes this is OK, where all data is in that one file. But
sometimes “_metadata” holds references to other files, which are not present.
I understand that if the size of the state is smaller than a set limit, it will
be stored only in that one file. And if it is larger, it would be spilled over
to additional files. Our state is generally miniscule, so it should always fit
into _metadata, but sometimes I can inspect the _metadata file and see
references to those additional files. Trying to restore from such a
save/check-point always fails.
Does anyone know of a reason for this behavior?
This is our configuration (relevant parts, I have substituted our account with
a variable):
high-availability.type: kubernetes
high-availability.cluster-id: flink-cluster-session-cluster
high-availability.storageDir:
wasbs://flink-storage@${account}.blob.core.windows.net/data
high-availability.jobmanager.port: 6123
state.backend.type: rocksdb
execution.checkpointing.num-retained: 3
execution.checkpointing.savepoint-dir:
wasbs://flink-storage@${account}.blob.core.windows.net/flink-savepoints
execution.checkpointing.mode: EXACTLY_ONCE
execution.checkpointing.incremental: true
execution.checkpointing.interval: 60000
execution.checkpointing.timeout: 300000
$internal.flink.version: v1_20
execution.checkpointing.storage: filesystem
execution.checkpointing.dir:
wasbs://flink-storage@${account}.blob.core.windows.net/flink-checkpoints
execution.checkpointing.externalized-checkpoint-retention:
RETAIN_ON_CANCELLATION
execution.checkpointing.min-pause: 5000
execution.target: kubernetes-session
fs.azure.account.keyprovider.${account}.blob.core.windows.net:
org.apache.flink.fs.azurefs.EnvironmentVariableKeyProvider
env.java.opts.all: --add-exports=java.base/sun.net.util=ALL-UNNAMED
--add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED
--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED
--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED
--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED
--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED
--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED
--add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.text=ALL-UNNAMED
--add-opens=java.base/java.time=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
Nikola.