Hello, I'm trying to deploy my Flink cluster inside of an AWS EKS using Flink Native. I want to use S3 as a filesystem for checkpointing, and giving the following options related to flink-s3-fs-presto:
"-Dhive.s3.endpoint": "https://s3.eu-central-1.amazonaws.com" "-Dhive.s3.iam-role": "arn:aws:iam::xxx:role/s3-flink" "-Dhive.s3.use-instance-credentials": "true" "-Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS": "flink-s3-fs-presto-1.13.2.jar" "-Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS": "flink-s3-fs-presto-1.13.2.jar" "-Dstate.backend": "rocksdb" "-Dstate.backend.incremental": "true" "-Dstate.checkpoints.dir": "s3://bucket/checkpoints/" "-Dstate.savepoints.dir": "s3://bucket/savepoints/" But my job fails with: 2021-10-08 11:38:49,771 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Could not properly dispose the private states in the pending checkpoint 45 of job 75bdd6fb6e689961ef4e096684e867bc. com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: JEZ3X8YPDZ2TF4T9; S3 Extended Request ID: u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c=; Proxy: null), S3 Extended Request ID: u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c= (Path: s3://bucket/checkpoints/75bdd6fb6e689961ef4e096684e867bc/chk-45) at com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573) ~[?:?] at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?] at com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560) ~[?:?] at com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311) ~[?:?] at com.facebook.presto.hive.s3.PrestoS3FileSystem.directory(PrestoS3FileSystem.java:450) ~[?:?] at com.facebook.presto.hive.s3.PrestoS3FileSystem.delete(PrestoS3FileSystem.java:427) ~[?:?] at org.apache.flink.fs.s3presto.common.HadoopFileSystem.delete(HadoopFileSystem.java:160) ~[?:?] at org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.delete(PluginFileSystemFactory.java:155) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.disposeOnFailure(FsCheckpointStorageLocation.java:117) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.checkpoint.PendingCheckpoint.discard(PendingCheckpoint.java:588) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:60) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanup$2(CheckpointsCleaner.java:85) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?] at java.util.concurrent.FutureTask.run(Unknown Source) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] at java.lang.Thread.run(Unknown Source) [?:?] Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: JEZ3X8YPDZ2TF4T9; S3 Extended Request ID: u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c=; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) ~[?:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) ~[?:?] at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) ~[?:?] at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) ~[?:?] at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062) ~[?:?] at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008) ~[?:?] at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1338) ~[?:?] at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1312) ~[?:?] at com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:563) ~[?:?] ... 17 more I can't figure out if it's a permission error or a configuration error of the Presto S3 plugin. The EKS pod has the following environment variables: Environment: ENABLE_BUILT_IN_PLUGINS: flink-s3-fs-presto-1.13.2.jar FLINK_TM_JVM_MEM_OPTS: -Xmx536870902 -Xms536870902 -XX:MaxDirectMemorySize=268435458 -XX:MaxMetaspaceSize=268435456 AWS_DEFAULT_REGION: eu-central-1 AWS_REGION: eu-central-1 AWS_ROLE_ARN: arn:aws:iam::xxx:role/s3-flink AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/ eks.amazonaws.com/serviceaccount/token Has anyone managed to deploy Flink with IAM access to S3 for checkpointing on AWS? Could you please share some working flink-s3-fs-presto or flink-s3-fs-hadoop plugin configuration with IAM authentication to S3? -- Best, Denis Nutiu