Hi Team,

I am currently reviewing Flink CDC (flink:1.20.3-java17) but came across an 
issue with storing checkpoints to S3 storage (VersityGW 1.1.0).

Flink is deployed in Kubernetes using an operator and trying to use S3 
compatible storage VersityGW for checkpoint storage.

I have added the Presto S3 plugin (flink-s3-fs-presto-1.20.3.jar) to my image 
as suggested.

I am unable to send checkpoint data to my S3 compatible storage : The error 
logs show Status 400 Bad Request and Malformed header (attached)

flink S3 configuration :

( Our S3 has a custom url and team-name and domain has been redacted)

  flinkConfiguration:
    execution.checkpointing.storage: "filesystem"
    execution.checkpointing.dir: "s3p://flink/flink-checkpoints/flink-green"
    execution.checkpointing.savepoint-dir: 
"s3p://flink/flink-checkpoints/flink-green"
    execution.checkpointing.incremental: true
    s3.endpoint: "https://s3.my-team.domain.com";
    s3.path.style.access: "true"
    s3.access-key: "flink"
    s3.secret-key: "flV4W********GoBkt"
    s3.region: "us-east-1"
    s3.ssl.enabled: "true"

    # Also added Presto-Specific config
    presto.s3.endpoint: "https://s3.my-team.domain.com";
    presto.s3.path.style.access: "true"
    presto.s3.region: "us-east-1"
    presto.s3.credential-provider: 
org.apache.flink.fs.s3.common.token.DynamicTemporaryAWSCredentialsProvider
    presto.s3.ssl.enabled: "true"


On checking it seemed like in the request header from S3 Presto, region is 
getting overwritten with "my-team" from the S3 url instead of "us-east-1"

The string coming after "S3" in the S3 endpoint is getting sent as region even 
when we have set s3.region: "us-east-1".

VersityGW logs:

versitygw    | <?xml version="1.0" encoding="UTF-8"?>
versitygw    |<Error><Code>AuthorizationHeaderMalformed</Code><Message>The 
authorization header is malformed; the region &#34;my-team&#34; is wrong; 
expecting&#34;us-east-1&#34;</Message><Resource></Resource><RequestId></RequestId><HostId></HostId></Error>
versitygw    | 08:40:53 | 400 |     357.908µs | 172.21.0.3 | GET | /flink/ | - 
| 
prefix=flink-checkpoints%2Fflink-green%2F66bee72c06094e303ec3a29a2760ba70%2Fchk-2%2F&delimiter=%2F&encoding-type=url

I spun up a local test setup of VersityGW S3 on docker and noticed that when 
the checkpoint S3 request is getting sent to IP address and port, the region in 
the header is correct and there are no errors.

Is this a limitation of the S3 Presto plugin? Any ideas on how this can be 
overcome? Thanks

Regards,
Amith

Attachment: flink-cdc-error.log
Description: flink-cdc-error.log

Reply via email to