Hi Tony,

A while ago, I have answered a similar question.[1]

You can try to increase this value appropriately. You can't put this
configuration in flink-conf.yaml, you can put it in the submit command of
the job[2], or in the configuration file you specify.

[1]:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-checkpoint-took-so-long-td22364.html#a22375
[2]:
https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/cli.html

Thanks, vino.

Tony Wei <tony19920...@gmail.com> 于2018年8月29日周三 上午11:36写道:

> Hi,
>
> I met checkpoint failure problem that cause by s3 exception.
>
> org.apache.flink.fs.s3presto.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
>> Your socket connection to the server was not read from or written to within
>> the timeout period. Idle connections will be closed. (Service: Amazon S3;
>> Status Code: 400; Error Code: RequestTimeout; Request ID:
>> B8BE8978D3EFF3F5), S3 Extended Request ID:
>> ePKce/MjMFPPNYi90rGdYmDw3blfvi0xR2CcJpCISEgxM92/6JZAU4whpfXeV6SfG62cnts0NBw=
>
>
> The full stack trace and screenshot is provided in the attachment.
>
> My setting for flink cluster and job:
>
>    - flink version 1.4.0
>    - standalone mode
>    - 4 slots for each TM
>    - presto s3 filesystem
>    - rocksdb statebackend
>    - local ssd
>    - enable incremental checkpoint
>
> No weird message beside the exception in the log file. No high ratio of GC
> during the checkpoint
> procedure. And still 3 of 4 parts uploaded successfully on that TM. I
> didn't find something that
> would related to this failure. Did anyone meet this problem before?
>
> Besides, I also found an issue in other aws sdk[1] that mentioned this s3
> exception as well. One
> reply said you can passively avoid the problem by raising the max client
> retires config. So I found
> that config in presto[2]. Can I just add s3.max-client-retries: xxx in
> flink-conf.yaml to config
> it? If not, how should I do to overwrite the default value of this
> configuration? Thanks in advance.
>
> Best,
> Tony Wei
>
> [1] https://github.com/aws/aws-sdk-php/issues/885
> [2]
> https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/HiveS3Config.java#L218
>

Reply via email to