Manoj Govindassamy created HDFS-11881:
-----------------------------------------

             Summary: NameNode consumes a lot of memory for snapshot diff 
report generation
                 Key: HDFS-11881
                 URL: https://issues.apache.org/jira/browse/HDFS-11881
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs, snapshots
    Affects Versions: 3.0.0-alpha1
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy


Problem:
HDFS supports a snapshot diff tool which can generate a [detailed report | 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html#Get_Snapshots_Difference_Report]
 of modified, created, deleted and renamed files between any 2 snapshots.
{noformat}
hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
{noformat}

However, if the diff list between 2 snapshots happens to be huge, in the order 
of millions, then NameNode can consume a lot of memory while generating the 
huge diff report. In a few cases, we are seeing NameNode getting into a long GC 
lasting for few minutes to make room for this burst in memory requirement 
during snapshot diff report generation.

RootCause:
* NameNode tries to generate the diff report with all diff entries at once 
which puts undue pressure 
* Each diff report entry has the diff type (enum), source path byte array, and 
destination path byte array to the minimum. Let's take file deletions use case. 
For file deletions, there would be only source or destination paths in the diff 
report entry. Let's assume these deleted files on average take 128Bytes for the 
path. 4 million file deletion captured in diff report will thus need 512MB of 
memory 
* The snapshot diff report uses simple java ArrayList which tries to double its 
backing contiguous memory chunk every time the usage factor crosses the 
capacity threshold. So, a 512MB memory requirement might be internally asking 
for a much larger contiguous memory chunk

Proposal:
* Make NameNode snapshot diff report service follow the batch model (like 
directory listing service). Clients (hdfs snapshotDiff command) will then 
receive  diff report in small batches, and need to iterate several times to get 
the full list.
* Additionally, snap diff report service in the NameNode can make use of 
ChunkedArrayList data structure instead of the current ArrayList so as to avoid 
the curse of fragmentation and large contiguous memory requirement.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to