So i'm really stumped on this for a couple of days now Some general info - Flink version 1.12.1, using k8s HA service. The k8s is self managed on AWS our checkpoints and savepoints are on s3, i created a new bucket just for it and set the proper permissions to the k8s node
The job manager is working, i can access the UI and upload a job. Looking at the startup logs i can see the bucket i set with no errors 2021-01-27 14:46:38,740 INFO org.apache.flink.runtime.blob.FileSystemBlobStore [] - Creating highly available BLOB storage directory at s3:/<bucketName>/ha-storage/default/blob (while there is no error, i can't find that directory in the bucket) However, once i submit the job i get an exception. Looking at the job manager logs im getting S3 access denied 2021-01-27 14:28:08,628 ERROR org.apache.flink.runtime.blob.BlobServerConnection [] - PUT operation failed java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 8W0N0T2R4P8P7YBT; S3 Extended Request ID: B6zBzIoBmzNoJ4bWQE9Ydt65+IN8pyHeJQuTc28AscyG0dSEM3G7WZHutOT2scJ/6WCoOuRi27A=; Proxy: null), S3 Extended Request ID: B6zBzIoBmzNoJ4bWQE9Ydt65+IN8pyHeJQuTc28AscyG0dSEM3G7WZHutOT2scJ/6WCoOuRi27A= So i created a new image based on the flink image with the aws cli installed and tried doing some s3 actions from the flink user through the shell flink@flink-jobmanager-1-12-f6cf4b5b6-xmkvb:~$ aws s3 ls s3://<bucketName> flink@flink-jobmanager-1-12-f6cf4b5b6-xmkvb:~$ touch oran.txt flink@flink-jobmanager-1-12-f6cf4b5b6-xmkvb:~$ aws s3 cp oran.txt s3://<bucketName>/oran.txt upload: ./oran.txt to s3://houzz-flink-1-12-session-cluster/oran.txt Some more information - we already have an older version of flink running on the same cluster/namespace (version 1.9.1) and it also uses s3 (a different bucket) and it's working. we used a homebrewed image for that version but it is closely based on how the original flink image is created (no funny buisness) Also, the s3 plugin im using is flink-s3-fs-presto-1.12.1.jar using the ENABLE_BUILT_IN_PLUGINS env variable. i tried using the hadoop one but got an error message it's missing, not sure what's up with that. totally working... and here i'm stuck. This makes 0 sense to me so i thought i should ask in the mailing list Thanks for all the help