Hi Tony, Maybe you can consider looking at the doc information for this class, this class comes from flink-s3-fs-presto.[1]
[1]: https://ci.apache.org/projects/flink/flink-docs-release-1.6/api/java/org/apache/hadoop/conf/Configuration.html Thanks, vino. Tony Wei <tony19920...@gmail.com> 于2018年8月29日周三 下午2:18写道: > Hi Vino, > > I thought this config is for aws s3 client, but this client is inner > flink-s3-fs-presto. > So, I guessed I should find a way to pass this config to this library. > > Best, > Tony Wei > > 2018-08-29 14:13 GMT+08:00 vino yang <yanghua1...@gmail.com>: > >> Hi Tony, >> >> Sorry, I just saw the timeout, I thought they were similar because they >> both happened on aws s3. >> Regarding this setting, isn't "s3.max-client-retries: xxx" set for the >> client? >> >> Thanks, vino. >> >> Tony Wei <tony19920...@gmail.com> 于2018年8月29日周三 下午1:17写道: >> >>> Hi Vino, >>> >>> Thanks for your quick reply, but I think these two questions are >>> different. The checkpoint in that question >>> finally finished, but my checkpoint failed due to s3 client timeout. You >>> can see from my screenshot that >>> showed the checkpoint failed in a short time. >>> >>> According to configuration, do you mean pass the configuration as >>> program's input arguments? I don't >>> think it will work. At least I need to find a way to pass it to s3 >>> filesystem builder in my program. However, >>> I will ask for help to pass it by flink-conf.yaml, because I used that >>> to config the global setting for s3 >>> filesystem and I thought it might have a simple way to support this >>> setting like other s3.xxx config. >>> >>> Very much appreciate for your answer and help. >>> >>> Best, >>> Tony Wei >>> >>> 2018-08-29 11:51 GMT+08:00 vino yang <yanghua1...@gmail.com>: >>> >>>> Hi Tony, >>>> >>>> A while ago, I have answered a similar question.[1] >>>> >>>> You can try to increase this value appropriately. You can't put this >>>> configuration in flink-conf.yaml, you can put it in the submit command of >>>> the job[2], or in the configuration file you specify. >>>> >>>> [1]: >>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-checkpoint-took-so-long-td22364.html#a22375 >>>> [2]: >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/cli.html >>>> >>>> Thanks, vino. >>>> >>>> Tony Wei <tony19920...@gmail.com> 于2018年8月29日周三 上午11:36写道: >>>> >>>>> Hi, >>>>> >>>>> I met checkpoint failure problem that cause by s3 exception. >>>>> >>>>> org.apache.flink.fs.s3presto.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: >>>>>> Your socket connection to the server was not read from or written to >>>>>> within >>>>>> the timeout period. Idle connections will be closed. (Service: Amazon S3; >>>>>> Status Code: 400; Error Code: RequestTimeout; Request ID: >>>>>> B8BE8978D3EFF3F5), S3 Extended Request ID: >>>>>> ePKce/MjMFPPNYi90rGdYmDw3blfvi0xR2CcJpCISEgxM92/6JZAU4whpfXeV6SfG62cnts0NBw= >>>>> >>>>> >>>>> The full stack trace and screenshot is provided in the attachment. >>>>> >>>>> My setting for flink cluster and job: >>>>> >>>>> - flink version 1.4.0 >>>>> - standalone mode >>>>> - 4 slots for each TM >>>>> - presto s3 filesystem >>>>> - rocksdb statebackend >>>>> - local ssd >>>>> - enable incremental checkpoint >>>>> >>>>> No weird message beside the exception in the log file. No high ratio >>>>> of GC during the checkpoint >>>>> procedure. And still 3 of 4 parts uploaded successfully on that TM. I >>>>> didn't find something that >>>>> would related to this failure. Did anyone meet this problem before? >>>>> >>>>> Besides, I also found an issue in other aws sdk[1] that mentioned this >>>>> s3 exception as well. One >>>>> reply said you can passively avoid the problem by raising the max >>>>> client retires config. So I found >>>>> that config in presto[2]. Can I just add s3.max-client-retries: xxx in >>>>> flink-conf.yaml to config >>>>> it? If not, how should I do to overwrite the default value of this >>>>> configuration? Thanks in advance. >>>>> >>>>> Best, >>>>> Tony Wei >>>>> >>>>> [1] https://github.com/aws/aws-sdk-php/issues/885 >>>>> [2] >>>>> https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/HiveS3Config.java#L218 >>>>> >>>> >>> >