Hello everyone,
I am running a flink job in k8s as a standalone HA job. Now I updated my
job w/ some additional sinks, which I guess have made the checkpoints
incompatible with the newer version, meaning flink now crashes on bootup
with the following:
 Caused by: java.lang.IllegalStateException: There is no operator for the
state c9b81dfc309f1368ac7efb5864e7b693

So I rollback the deployment, log into the pod and create a savestate, and
then modify my args to add

--allowNonRestoredState
and
-s <savepoint-dir>

but it doesn't look like the standalone cluster is respecting those
arguments. I've tried searching around, but haven't found any solutions.
The docker image I have is running the docker-entrypoint.sh and the full
arg list is below as copy-pastad out of my k8s yaml file:

 47         - job-cluster
 48         - -Djobmanager.rpc.address=$(SERVICE_NAME)
 49         - -Djobmanager.rpc.port=6123
 50         - -Dresourcemanager.rpc.port=6123
 51         - -Dparallelism.default=$(NUM_WORKERS)
 52         - -Dblob.server.port=6124
 53         - -Dqueryable-state.server.ports=6125
 54         - -Ds3.access-key=$(AWS_ACCESS_KEY_ID)
 55         - -Ds3.secret-key=$(AWS_SECRET_ACCESS_KEY)
 56         - -Dhigh-availability=zookeeper
 57         - -Dhigh-availability.jobmanager.port=50010
 58         - -Dhigh-availability.storageDir=$(S3_HA)
 59         - -Dhigh-availability.zookeeper.quorum=$(ZK_QUORUM)
 60         - -Dstate.backend=filesystem
 61         - -Dstate.checkpoints.dir=$(S3_CHECKPOINT)
 62         - -Dstate.savepoints.dir=$(S3_SAVEPOINT)
 63         - --allowNonRestoredState
 64         - -s $(S3_SAVEPOINT)

I originally didn't have the last 2 args, I added them based upon various
emails I saw on this list and other google search results, to no avail.

Thanks
-Matt

Reply via email to