[ 
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799525#comment-13799525
 ] 

Lin Xiao commented on HDFS-5389:
--------------------------------

One design to remove the  space limits while maintain similar performance is by 
 caching only the working set or hot metadata in Namenode memory. This approach 
can be very effective because  the subset of files  that is frequently accessed 
is much smaller than the full set of files stored in HDFS.  This is becoming 
common because HDFS allows customers to cost effectively store data for many 
years even though only the latest data is accessed frequently. The goal is that 
there is only a slight or no degradation when the working set fits in memory. 
By effective cache replacement policies we believe that we can deal with the 
cases when applications access cold data occasionally.

The current implementation uses log-structured merge-tree(LSM-tree) to store 
the Namenode's metadata persistently. A related  project at CMU, TableFS,  
showed that LSM-trees were very effective for handling metadata in file 
systems. We have built a prototype the Namenode using the LevelDB. LevelDB is 
an open source fast key-value store library using LSM-tree.

> Remove INode limitations in Namenode
> ------------------------------------
>
>                 Key: HDFS-5389
>                 URL: https://issues.apache.org/jira/browse/HDFS-5389
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 0.23.1
>            Reporter: Lin Xiao
>            Priority: Minor
>
> Current HDFS Namenode stores all of its metadata in RAM. This has allowed 
> Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits 
> the total number of files that a single NN can store. While Federation allows 
> one to create multiple volumes with additional Namenodes, there is a need to 
> scale a single namespace and also to store multiple namespaces in a single 
> Namenode. When inodes are also stored on persistent storage, the system's 
> boot time can be significantly reduced because there is no need to replay 
> edit logs. It also provides the potential to support extended attributes once 
> the memory size is not the bottleneck.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to