[ https://issues.apache.org/jira/browse/YARN-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518171#comment-16518171 ]
Jim Brennan commented on YARN-8444: ----------------------------------- The bad value came from /proc/meminfo - it looks like it returned a negative value expressed as an unsigned decimal value, which was too big to parse as a long. > NodeResourceMonitor crashes on bad swapFree value > ------------------------------------------------- > > Key: YARN-8444 > URL: https://issues.apache.org/jira/browse/YARN-8444 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.8.3, 3.0.2 > Reporter: Jim Brennan > Assignee: Jim Brennan > Priority: Major > > Saw this on a node that was having difficulty preempting containers. Can't > have NodeResourceMonitor exiting. System was above 99% memory used at the > time so it may only be something that happens when normal preemption isn't > work right, but we should fix since this is a critical monitor to the health > of the node. > > {noformat} > 2018-06-04 14:28:08,539 [Container Monitor] DEBUG > ContainersMonitorImpl.audit: Memory usage of ProcessTree 110564 for > container-id container_e24_1526662705797_129647_01_004791: 2.1 GB of 3.5 GB > physical memory used; 5.0 GB of 7.3 GB virtual memory used > 2018-06-04 14:28:10,622 [Node Resource Monitor] ERROR > yarn.YarnUncaughtExceptionHandler: Thread Thread[Node Resource > Monitor,5,main] threw an Exception. > java.lang.NumberFormatException: For input string: "18446744073709551596" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:592) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.util.SysInfoLinux.readProcMemInfoFile(SysInfoLinux.java:257) > at > org.apache.hadoop.util.SysInfoLinux.getAvailablePhysicalMemorySize(SysInfoLinux.java:591) > at > org.apache.hadoop.util.SysInfoLinux.getAvailableVirtualMemorySize(SysInfoLinux.java:601) > at > org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getAvailableVirtualMemorySize(ResourceCalculatorPlugin.java:74) > at > org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl$MonitoringThread.run(NodeResourceMonitorImpl.java:193) > 2018-06-04 14:28:30,747 > [org.apache.hadoop.util.JvmPauseMonitor$Monitor@226eba67] INFO > util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of > approximately 9330ms > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org