[jira] [Commented] (HDFS-8913) Documentation clarity regarding Secondary node, Checkpoint node & Backup node

Jeff Zhang (JIRA) Sun, 03 Jan 2016 23:46:43 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080782#comment-15080782
 ]


Jeff Zhang commented on HDFS-8913:
----------------------------------

+1, I think we do need to highlight the differences between these roles. 

> Documentation clarity regarding Secondary node, Checkpoint node & Backup node
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-8913
>                 URL: https://issues.apache.org/jira/browse/HDFS-8913
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 2.7.1
>         Environment: Content in documentation
>            Reporter: Ravindra Babu
>            Priority: Trivial
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I checked with many people and almost all of them are confused on 
> responsibilities of Secondary Node, Checkpoint Node and Backup node.
> Link:
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
> Confusion:
> Secondary NameNode
> The NameNode stores modifications to the file system as a log appended to a 
> native file system file, edits. When a NameNode starts up, it reads HDFS 
> state from an image file, fsimage, and then applies edits from the edits log 
> file. It then writes new HDFS state to the fsimage and starts normal 
> operation with an empty edits file. Since NameNode merges fsimage and edits 
> files only during start up, the edits log file could get very large over time 
> on a busy cluster. Another side effect of a larger edits file is that next 
> restart of NameNode takes longer.
> The secondary NameNode merges the fsimage and the edits log files 
> periodically and keeps edits log size within a limit. It is usually run on a 
> different machine than the primary NameNode since its memory requirements are 
> on the same order as the primary NameNode.
> Checkpoint Node
> NameNode persists its namespace using two files: fsimage, which is the latest 
> checkpoint of the namespace and edits, a journal (log) of changes to the 
> namespace since the checkpoint. When a NameNode starts up, it merges the 
> fsimage and edits journal to provide an up-to-date view of the file system 
> metadata. The NameNode then overwrites fsimage with the new HDFS state and 
> begins a new edits journal.
> Backup Node
> The Backup node provides the same checkpointing functionality as the 
> Checkpoint node, as well as maintaining an in-memory, up-to-date copy of the 
> file system namespace that is always synchronized with the active NameNode 
> state. Along with accepting a journal stream of file system edits from the 
> NameNode and persisting this to disk, the Backup node also applies those 
> edits into its own copy of the namespace in memory, thus creating a backup of 
> the namespace.
> Now all three nodes have overlapping functionalities. To add confusion to 
> this point, 
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> quotes that NameNode will never make RPC call to other nodes.
> The Communication Protocols
> All HDFS communication protocols are layered on top of the TCP/IP protocol. A 
> client establishes a connection to a configurable TCP port on the NameNode 
> machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to 
> the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) 
> abstraction wraps both the Client Protocol and the DataNode Protocol. By 
> design, the NameNode never initiates any RPCs. Instead, it only responds to 
> RPC requests issued by DataNodes or clients.
> We need clarification regarding these points. Please enhance your 
> documentation to avoid confusion among readers.
> 1) Secondary Node, Check point Node & Backup node - Clear separation of roles
> 2) For High Availability, do we require  only One of them Or Two of them or 
> All of them? If it's not all of them, what combination is allowed?
> 3) Without RPC by Name node to data nodes, how writes and read are happening?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8913) Documentation clarity regarding Secondary node, Checkpoint node & Backup node

Reply via email to