Hi Flavio, Is this running on YARN or bare metal? Did you manage to find out where this insanely large parameter is coming from?
Best, Aljoscha > On 25. May 2017, at 19:36, Flavio Pompermaier <pomperma...@okkam.it> wrote: > > Hi to all, > I think we found the root cause of all the problems. Looking ad dmesg there > was a "crazy" total-vm size associated to the OOM error, a LOT much bigger > than the TaskManager's available memory. > In our case, the TM had a max heap of 14 GB while the dmsg error was > reporting a required amount of memory in the order of 60 GB! > > [ 5331.992539] Out of memory: Kill process 24221 (java) score 937 or > sacrifice child > [ 5331.992619] Killed process 24221 (java) total-vm:64800680kB, > anon-rss:31387544kB, file-rss:6064kB, shmem-rss:0kB > > That wasn't definitively possible usin an ordinary JVM (and our TM was > running without off-heap settings) so we've looked at the parameters used to > run the TM JVM and indeed there was a reall huge amount of memory given to > MaxDirectMemorySize. With my big surprise Flink runs a TM with this parameter > set to 8.388.607T..does it make any sense?? > Is it documented anywhere the importance of this parameter (and why it is > used in non off-heap mode as well)? Is it related to network buffers? > It should also be documented that this parameter should be added to the TM > heap when reserving memory to Flin (IMHO). > > I hope that this painful sessions of Flink troubleshooting could be an added > value sooner or later.. > > Best, > Flavio > > On Thu, May 25, 2017 at 10:21 AM, Flavio Pompermaier <pomperma...@okkam.it > <mailto:pomperma...@okkam.it>> wrote: > I can confirm that after giving less memory to the Flink TM the job was able > to run successfully. > After almost 2 weeks of pain, we summarize here our experience with Fink in > virtualized environments (such as VMWare ESXi): > Disable the virtualization "feature" that transfer a VM from a (heavy loaded) > physical machine to another one (to balance the resource consumption) > Check dmesg when a TM dies without logging anything (usually it goes OOM and > the OS kills it but there you can find the log of this thing) > CentOS 7 on ESXi seems to start swapping VERY early (in my case I see the OS > starting swapping also if there are 12 out of 32 GB of free memory)! > We're still investigating how this behavior could be fixed: the problem is > that it's better not to disable swapping because otherwise VMWare could start > ballooning (that is definitely worse...). > > I hope this tips could save someone else's day.. > > Best, > Flavio > > On Wed, May 24, 2017 at 4:28 PM, Flavio Pompermaier <pomperma...@okkam.it > <mailto:pomperma...@okkam.it>> wrote: > Hi Greg, you were right! After typing dmsg I found "Out of memory: Kill > process 13574 (java)". > This is really strange because the JVM of the TM is very calm. > Moreover, there are 7 GB of memory available (out of 32) but somehow the OS > decides to start swapping and, when it runs out of available swap memory, the > OS decides to kill the Flink TM :( > > Any idea of what's going on here? > > On Wed, May 24, 2017 at 2:32 PM, Flavio Pompermaier <pomperma...@okkam.it > <mailto:pomperma...@okkam.it>> wrote: > Hi Greg, > I carefully monitored all TM memory with jstat -gcutil and there'no full gc, > only . > The initial situation on the dying TM is: > > S0 S1 E O M CCS YGC YGCT FGC FGCT > GCT > 0.00 100.00 33.57 88.74 98.42 97.17 159 2.508 1 0.255 > 2.763 > 0.00 100.00 90.14 88.80 98.67 97.17 197 2.617 1 0.255 > 2.873 > 0.00 100.00 27.00 88.82 98.75 97.17 234 2.730 1 0.255 > 2.986 > > After about 10 hours of processing is: > > 0.00 100.00 21.74 83.66 98.52 96.94 5519 33.011 1 0.255 > 33.267 > 0.00 100.00 21.74 83.66 98.52 96.94 5519 33.011 1 0.255 > 33.267 > 0.00 100.00 21.74 83.66 98.52 96.94 5519 33.011 1 0.255 > 33.267 > > So I don't think thta OOM could be an option. > > However, the cluster is running on ESXi vSphere VMs and we already > experienced unexpected crash of jobs because of ESXi moving a heavy-loaded VM > to another (less loaded) physical machine..I would't be surprised if swapping > is also handled somehow differently.. > Looking at Cloudera widgets I see that the crash is usually preceded by an > intense cpu_iowait period. > I fear that Flink unsafe access to memory could be a problem in those > scenarios. Am I wrong? > > Any insight or debugging technique is greatly appreciated. > Best, > Flavio > > > On Wed, May 24, 2017 at 2:11 PM, Greg Hogan <c...@greghogan.com > <mailto:c...@greghogan.com>> wrote: > Hi Flavio, > > Flink handles interrupts so the only silent killer I am aware of is Linux's > OOM killer. Are you seeing such a message in dmesg? > > Greg > > On Wed, May 24, 2017 at 3:18 AM, Flavio Pompermaier <pomperma...@okkam.it > <mailto:pomperma...@okkam.it>> wrote: > Hi to all, > I'd like to know whether memory swapping could cause a taskmanager crash. > In my cluster of virtual machines 'm seeing this strange behavior in my Flink > cluster: sometimes, if memory get swapped the taskmanager (on that machine) > dies unexpectedly without any log about the error. > > Is that possible or not? > > Best, > Flavio > > >