[ https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shashikant Banerjee resolved HDFS-14492. ---------------------------------------- Fix Version/s: 3.1.4 Resolution: Fixed Thanks [~jojochuang] for the contribution. I have committed this change to trunk. > Snapshot memory leak > -------------------- > > Key: HDFS-14492 > URL: https://issues.apache.org/jira/browse/HDFS-14492 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots > Affects Versions: 2.6.0 > Environment: CDH5.14.4 > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Priority: Major > Fix For: 3.1.4 > > > We recently examined the NameNode heap dump of a big, heavy snapshot user, > trying to trim some fat, and surely enough we found memory leak in it: when > snapshots are removed, the corresponding data structures are not removed. > This cluster has 586 million file system objects (286 million files, 287 > million blocks, 13 million directories), using around 132gb of heap. > While only 44.5 million files have snapshotted copies, > (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have > FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies > at some point in the past, but after snapshots are removed, those data > structured are still kept in the heap. > INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, > FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in > large clusters like this. In this cluster, a whopping 13.8gb of memory could > have been saved: ((32.5 + 32 + 24) bytes * (211997769 - 44572380) =~ > 13.8gb) if not for this bug. That is more than 10% of savings in heap size. > Heap histogram for reference: > {noformat} > num #instances #bytes class name > ---------------------------------------------- > 1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile > 2: 287322227 18388622528 > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo > 3: 227899550 17144816120 [B > 4: 287324031 13769408616 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo; > 5: 71352116 12353841568 [Ljava.lang.Object; > 6: 286322650 9170335840 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; > 7: 235632329 7658462416 > [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; > 8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; > 9: 211997769 6783928608 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature > 10: 211997769 5087946456 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList > 11: 76586261 3780468856 [I > 12: 44572380 3209211360 > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy > 13: 58634517 2345380680 java.util.ArrayList > 14: 44572380 2139474240 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff > 15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature > 16: 12907668 1135874784 > org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat} > [~szetszwo] [~arpaga] [~smeng] [~shashikant] any thoughts? > I am thinking that inside > AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file > diffs, it should also remove FileWithSnapshotFeature. I am not familiar with > the snapshot implementation, so any guidance is greatly appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org