Re: Flink and swapping question

Aljoscha Krettek Mon, 29 May 2017 06:20:07 -0700

Hi Flavio,

Is this running on YARN or bare metal? Did you manage to find out where this 
insanely large parameter is coming from?


Best,
Aljoscha

> On 25. May 2017, at 19:36, Flavio Pompermaier <pomperma...@okkam.it> wrote:
> 
> Hi to all,
> I think we found the root cause of all the problems. Looking ad dmesg there 
> was a "crazy" total-vm size associated to the OOM error, a LOT much bigger 
> than the TaskManager's available memory.
> In our case, the TM had a max heap of 14 GB while the dmsg error was 
> reporting a required amount of memory in the order of 60 GB!
> 
> [ 5331.992539] Out of memory: Kill process 24221 (java) score 937 or 
> sacrifice child
> [ 5331.992619] Killed process 24221 (java) total-vm:64800680kB, 
> anon-rss:31387544kB, file-rss:6064kB, shmem-rss:0kB
> 
> That wasn't definitively possible usin an ordinary JVM (and our TM was 
> running without off-heap settings) so we've looked at the parameters used to 
> run the TM JVM and indeed there was a reall huge amount of memory given to 
> MaxDirectMemorySize. With my big surprise Flink runs a TM with this parameter 
> set to 8.388.607T..does it make any sense??
> Is it documented anywhere the importance of this parameter (and why it is 
> used in non off-heap mode as well)? Is it related to network buffers?
> It should also be documented that this parameter should be added to the TM 
> heap when reserving memory to Flin (IMHO).
> 
> I hope that this painful sessions of Flink troubleshooting could be an added 
> value sooner or later..
> 
> Best,
> Flavio
> 
> On Thu, May 25, 2017 at 10:21 AM, Flavio Pompermaier <pomperma...@okkam.it 
> <mailto:pomperma...@okkam.it>> wrote:
> I can confirm that after giving less memory to the Flink TM the job was able 
> to run successfully.
> After almost 2 weeks of pain, we summarize here our experience with Fink in 
> virtualized environments (such as VMWare ESXi):
> Disable the virtualization "feature" that transfer a VM from a (heavy loaded) 
> physical machine to another one (to balance the resource consumption)
> Check dmesg when a TM dies without logging anything (usually it goes OOM and 
> the OS kills it but there you can find the log of this thing)
> CentOS 7 on ESXi seems to start swapping VERY early (in my case I see the OS 
> starting swapping also if there are 12 out of 32 GB of free memory)!
> We're still investigating how this behavior could be fixed: the problem is 
> that it's better not to disable swapping because otherwise VMWare could start 
> ballooning (that is definitely worse...).
> 
> I hope this tips could save someone else's day..
> 
> Best,
> Flavio
> 
> On Wed, May 24, 2017 at 4:28 PM, Flavio Pompermaier <pomperma...@okkam.it 
> <mailto:pomperma...@okkam.it>> wrote:
> Hi Greg, you were right! After typing dmsg I found "Out of memory: Kill 
> process 13574 (java)".
> This is really strange because the JVM of the TM is very calm.
> Moreover, there are 7 GB of memory available (out of 32) but somehow the OS 
> decides to start swapping and, when it runs out of available swap memory, the 
> OS decides to kill the Flink TM :(
> 
> Any idea of what's going on here?
> 
> On Wed, May 24, 2017 at 2:32 PM, Flavio Pompermaier <pomperma...@okkam.it 
> <mailto:pomperma...@okkam.it>> wrote:
> Hi Greg,
> I carefully monitored all TM memory with jstat -gcutil and there'no full gc, 
> only .
> The initial situation on the dying TM is:
> 
>   S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     
> GCT   
>   0.00 100.00  33.57  88.74  98.42  97.17    159    2.508     1    0.255    
> 2.763
>   0.00 100.00  90.14  88.80  98.67  97.17    197    2.617     1    0.255    
> 2.873
>   0.00 100.00  27.00  88.82  98.75  97.17    234    2.730     1    0.255    
> 2.986
> 
> After about 10 hours of processing is:
> 
>   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1    0.255   
> 33.267
>   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1    0.255   
> 33.267
>   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1    0.255   
> 33.267
> 
> So I don't think thta OOM could be an option.
> 
> However, the cluster is running on ESXi vSphere VMs and we already 
> experienced unexpected crash of jobs because of ESXi moving a heavy-loaded VM 
> to another (less loaded) physical machine..I would't be surprised if swapping 
> is also handled somehow differently..
> Looking at Cloudera widgets I see that the crash is usually preceded by an 
> intense cpu_iowait period.
> I fear that Flink unsafe access to memory could be a problem in those 
> scenarios. Am I wrong?
> 
> Any insight or debugging technique is  greatly appreciated.
> Best,
> Flavio
> 
> 
> On Wed, May 24, 2017 at 2:11 PM, Greg Hogan <c...@greghogan.com 
> <mailto:c...@greghogan.com>> wrote:
> Hi Flavio,
> 
> Flink handles interrupts so the only silent killer I am aware of is Linux's 
> OOM killer. Are you seeing such a message in dmesg?
> 
> Greg
> 
> On Wed, May 24, 2017 at 3:18 AM, Flavio Pompermaier <pomperma...@okkam.it 
> <mailto:pomperma...@okkam.it>> wrote:
> Hi to all,
> I'd like to know whether memory swapping could cause a taskmanager crash. 
> In my cluster of virtual machines 'm seeing this strange behavior in my Flink 
> cluster: sometimes, if memory get swapped the taskmanager (on that machine) 
> dies unexpectedly without any log about the error.
> 
> Is that possible or not?
> 
> Best,
> Flavio
> 
> 
>

Re: Flink and swapping question

Reply via email to