[ https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692544#comment-17692544 ]
zinx commented on HDFS-16726: ----------------------------- [~yuyanlei] we use 2.6.x, and we upgrade jdk to 12 and remove G1RSetRegionEntries param. we not use 3.x before. > There is a memory-related problem about HDFS namenode > ----------------------------------------------------- > > Key: HDFS-16726 > URL: https://issues.apache.org/jira/browse/HDFS-16726 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode > Affects Versions: 2.7.2 > Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G > -XX:MetaspaceSize=128M -server \ > -XX:+UseG1GC -XX:+UseStringDeduplication > -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions > -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 \ > -XX:G1OldCSetRegionThresholdPercent=1 > -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout > -XX:SafepointTimeoutDelay=4000 \ > -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 > -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \ > -XX:G1HeapWastePercent=9 > -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \ > -XX:+ParallelRefProcEnabled -XX:-ResizePLAB > -XX:+PrintAdaptiveSizePolicy \ > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ > -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \ > -XX:+HeapDumpOnOutOfMemoryError > -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log > -XX:HeapDumpPath=$HADOOP_LOG_DIR \ > -Dcom.sun.management.jmxremote \ > -Dcom.sun.management.jmxremote.port=9009 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false \ > $HADOOP_NAMENODE_OPTS > Reporter: Yanlei Yu > Priority: Critical > > In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX > =280GB). The actual memory usage of Namenode is 479GB > Output via pamp: > Address Perm Offset Device Inode Size Rss Pss > Referenced Anonymous Swap Locked Mapping > 2b42f0000000 rw-p 00000000 00:00 0 294174720 293756960 293756960 > 293756960 293756960 0 0 > 01e21000 rw-p 00000000 00:00 0 195245456 195240848 195240848 > 195240848 195240848 0 0 [heap] > 2b897c000000 rw-p 00000000 00:00 0 9246724 9246724 9246724 > 9246724 9246724 0 0 > 2b8bb0905000 rw-p 00000000 00:00 0 1781124 1754572 1754572 > 1754572 1754572 0 0 > 2b8936000000 rw-p 00000000 00:00 0 1146880 1002084 1002084 > 1002084 1002084 0 0 > 2b42db652000 rwxp 00000000 00:00 0 57792 55252 55252 > 55252 55252 0 0 > 2b42ec12a000 rw-p 00000000 00:00 0 25696 24700 24700 > 24700 24700 0 0 > 2b42ef25b000 rw-p 00000000 00:00 0 9988 8972 8972 > 8972 8972 0 0 > 2b8c1d467000 rw-p 00000000 00:00 0 9216 8204 8204 > 8204 8204 0 0 > 2b8d6f8db000 rw-p 00000000 00:00 0 7160 6228 6228 > 6228 6228 0 0 > The first line should configure the memory footprint for XMX, and [heap] is > unusually large, so a memory leak is suspected! > > * [heap] is associated with malloc > After configuring JCMD in the test environment, we found that the malloc part > of Internal in JCMD increased significantly when the client was writing to a > gz file (XMX =40g in the test environment, and the Internal area was 900MB > before the client wrote) : > Total: reserved=47276MB, committed=47070MB > - Java Heap (reserved=40960MB, committed=40960MB) > (mmap: reserved=40960MB, committed=40960MB) > > - Class (reserved=53MB, committed=52MB) > (classes #7423) > (malloc=1MB #17053) > (mmap: reserved=52MB, committed=52MB) > > - Thread (reserved=2145MB, committed=2145MB) > (thread #2129) > (stack: reserved=2136MB, committed=2136MB) > (malloc=7MB #10673) > (arena=2MB #4256) > > - Code (reserved=251MB, committed=45MB) > (malloc=7MB #10661) > (mmap: reserved=244MB, committed=38MB) > > - GC (reserved=2307MB, committed=2307MB) > (malloc=755MB #525664) > (mmap: reserved=1552MB, committed=1552MB) > > - Compiler (reserved=8MB, committed=8MB) > (malloc=8MB #8852) > > - Internal (reserved=1524MB, committed=1524MB) > (malloc=1524MB #323482) > > - Symbol (reserved=12MB, committed=12MB) > (malloc=10MB #91715) > (arena=2MB #1) > > - Native Memory Tracking (reserved=16MB, committed=16MB) > (tracking overhead=15MB) > It is clear that the Internal malloc increases significantly when the client > writes, and does not decrease after the client stops writing > > Through pref, I found some more instances when writing on the client side: > Children Self Comm Shared Ob Symbol > > > 0.05% 0.00% java libzip.so [.] Java_java_util_zip_ZipFile_getEntry > 0.02% 0.00% java libzip.so [.] > Java_java_util_zip_Inflater_inflateBytes > Therefore, it is suspected that the compressed write operation of the client > may have a memory leak problem > > Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes: > "ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x000000002419d000 > nid=0x69df runnable [0x00002b319d7a0000] > java.lang.Thread.State: RUNNABLE > at java.util.zip.Inflater.inflateBytes(Native Method) > at java.util.zip.Inflater.inflate(Inflater.java:259) > - locked <0x00002b278f7b9da8> (a java.util.zip.ZStreamRef) > at > java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown > Source) > at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) > at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) > at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) > at > org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) > at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2594) > at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2582) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2656) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2606) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2519) > - locked <0x00002b3114eb4a98> (a org.apache.hadoop.conf.Configuration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091) > at > org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at > org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1546) > at > org.apache.hadoop.util.WhiteListFileManager.refresh(WhiteListFileManager.java:176) > - locked <0x00002b2d6fe06a28> (a java.lang.Class for > org.apache.hadoop.util.WhiteListFileManager) > at > org.apache.hadoop.util.ExtensionManager$2.run(ExtensionManager.java:70) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org