So i'm really stumped on this for a couple of days now
Some general info -
Flink version 1.12.1, using k8s HA service. The k8s is self managed on AWS
our checkpoints and savepoints are on s3, i created a new bucket just for it
and set the proper permissions to the k8s node
The job manager is working, i can access the UI and upload a job. Looking at
the startup logs i can see the bucket i set with no errors
2021-01-27 14:46:38,740 INFO org.apache.flink.runtime.blob.FileSystemBlobStore
[] - Creating highly available BLOB storage directory at
s3:/<bucketName>/ha-storage/default/blob
(while there is no error, i can't find that directory in the bucket)
However, once i submit the job i get an exception. Looking at the job manager
logs im getting S3 access denied
2021-01-27 14:28:08,628 ERROR
org.apache.flink.runtime.blob.BlobServerConnection [] - PUT operation
failed
java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Access
Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request
ID: 8W0N0T2R4P8P7YBT; S3 Extended Request ID:
B6zBzIoBmzNoJ4bWQE9Ydt65+IN8pyHeJQuTc28AscyG0dSEM3G7WZHutOT2scJ/6WCoOuRi27A=;
Proxy: null), S3 Extended Request ID:
B6zBzIoBmzNoJ4bWQE9Ydt65+IN8pyHeJQuTc28AscyG0dSEM3G7WZHutOT2scJ/6WCoOuRi27A=
So i created a new image based on the flink image with the aws cli installed
and tried doing some s3 actions from the flink user through the shell
flink@flink-jobmanager-1-12-f6cf4b5b6-xmkvb:~$ aws s3 ls s3://<bucketName>
flink@flink-jobmanager-1-12-f6cf4b5b6-xmkvb:~$ touch oran.txt
flink@flink-jobmanager-1-12-f6cf4b5b6-xmkvb:~$ aws s3 cp oran.txt
s3://<bucketName>/oran.txt
upload: ./oran.txt to s3://houzz-flink-1-12-session-cluster/oran.txt
Some more information - we already have an older version of flink running on
the same cluster/namespace (version 1.9.1) and it also uses s3 (a different
bucket) and it's working. we used a homebrewed image for that version but it is
closely based on how the original flink image is created (no funny buisness)
Also, the s3 plugin im using is flink-s3-fs-presto-1.12.1.jar using the
ENABLE_BUILT_IN_PLUGINS env variable. i tried using the hadoop one but got an
error message it's missing, not sure what's up with that.
totally working... and here i'm stuck. This makes 0 sense to me so i thought i
should ask in the mailing list
Thanks for all the help