Blocked requesting MemorySegment when Segments are available.

2020-06-09 Thread David Maddison
Hi, I keep seeing the following situation where a task is blocked getting a MemorySegment from the pool but the TaskManager is reporting that it has lots of MemorySegments available. I'm completely stumped as to how to debug or what to look at next so any hints/help/advice would be greatly

Re: Automatically Clearing Temporary Directories

2020-03-16 Thread David Maddison
porary volumes in use are dedicated to the TaskManager and > not shared :-) > Yes, it is safe in your case. > > Best, > Gary > > On Tue, Mar 10, 2020 at 6:39 PM David Maddison > wrote: > >> Hi, >> >> When a TaskManager is restarted it can leave behind unr

Automatically Clearing Temporary Directories

2020-03-10 Thread David Maddison
Hi, When a TaskManager is restarted it can leave behind unreferenced BlobServer cache directories in the temporary storage that never get cleaned up. Would it be safe to automatically clear the temporary storage every time when a TaskManager is started? (Note: the temporary volumes in use are

Re: Flink Job claster scalability

2020-01-09 Thread David Maddison
Hi KristoffSC, As Zhu Zhu explained, Flink does not currently auto-scale a Job as new resources become available. Instead the Job must be stopped via a savepoint and restarted with a new parallelism (the old rescale CLI experiment use to perform this). Making Flink reactive to new resources and

Re: Temporary failure in name resolution on JobManager

2019-12-02 Thread David Maddison
the ttl and have a > try. > sun.net.inetaddr.ttl > sun.net.inetaddr.negative.ttl > > > Best, > Yang > > David Maddison 于2019年11月29日周五 下午6:41写道: > >> I have a Flink 1.7 cluster using the "flink:1.7.2" (OpenJDK build >> 1.8.0_222-b10) image on Kuberne

Temporary failure in name resolution on JobManager

2019-11-29 Thread David Maddison
I have a Flink 1.7 cluster using the "flink:1.7.2" (OpenJDK build 1.8.0_222-b10) image on Kubernetes. As part of a MasterRestoreHook (for checkpointing) the JobManager needs to communicate with an external security service. This all works well until there's a DNS lookup failure (due to network