Re: Network issue leading to "No pooled slot available"

2020-10-09 Thread Dan Diephouse
Quick update: it appears to work outside my test case too. I have not encountered this issue post update at all. On Thu, Oct 8, 2020 at 11:15 PM Khachatryan Roman < khachatryan.ro...@gmail.com> wrote: > Thanks for checking this workaround! > > I've created a jira issue [1] to check if AWS SDK

Re: Network issue leading to "No pooled slot available"

2020-10-09 Thread Khachatryan Roman
Thanks for checking this workaround! I've created a jira issue [1] to check if AWS SDK version can be upgraded in Flink distribution. Regards, Roman On Fri, Oct 9, 2020 at 12:54 AM Dan Diephouse wrote: > Well, I just dropped in the latest Amazon 1.11.878 SDK and now it > appears to respect

Re: Network issue leading to "No pooled slot available"

2020-10-08 Thread Dan Diephouse
Well, I just dropped in the latest Amazon 1.11.878 SDK and now it appears to respect interrupts in a test case I created. (the test fails with the SDK that is in use by Flink) I will try it in a full fledged Flink environment and report back. On Thu, Oct 8, 2020 at 3:41 PM Dan Diephouse wrote:

Re: Network issue leading to "No pooled slot available"

2020-10-08 Thread Dan Diephouse
Did some digging... definitely appears that the Amazon SDK definitely is not picking up the interrupt. I will try playing with the connection timeout. Hadoop defaults it to 20 ms, which may be part of the problem. Anyone have any other ideas? In theory this should be fixed by SDK v2 which

Re: Network issue leading to "No pooled slot available"

2020-10-08 Thread Dan Diephouse
Using the latest - 1.11.2. I would assume the interruption is being ignored in the Hadoop / S3 layer. I was looking at the defaults and (if I understood correctly) the client will retry 20 times. Which would explain why it never gets cancelled... On Thu, Oct 8, 2020 at 1:27 AM Khachatryan Roman

Re: Network issue leading to "No pooled slot available"

2020-10-08 Thread Khachatryan Roman
Hi Dan Diephouse, >From the logs you provided indeed it looks like 1 causes 2 => 3 => 4, where 2 is a bug. It's unclear though where the interruption is ignored (Flink/Hadoop FS/S3 client). What version of Flink are you using? Regards, Roman On Wed, Oct 7, 2020 at 11:16 PM Dan Diephouse

Network issue leading to "No pooled slot available"

2020-10-07 Thread Dan Diephouse
I am now using the S3 StreamingFileSink to send data to an S3 bucket. If/when the network connection has issues, it seems to put Flink into an irrecoverable state. Am I understanding this correctly? Any suggestions on how to troubleshoot / fix? Here is what I'm observing: *1. Network is dropped