[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703890#comment-15703890 ]
Mingliang Liu edited comment on HDFS-11094 at 11/29/16 2:18 AM: ---------------------------------------------------------------- Hi [~ebadger], thanks for updating the patch. Sorry for returning late from holiday. The patch looks good to me overall. I have two thoughts for your consideration. # I discussed with [~arpitagarwal] offline and he suggested us use the same logic in {{updateActorStatesFromHeartbeat}} to update the active NN {{bpServiceToActive}}, which has dealt with several cases carefully. Moreover, if we are updating {{bpServiceToActive}} we should likely also update {{lastActiveClaimTxId}}. To achieve this, I think we can pass {{NNHAStatusHeartbeatProto}} instead of {{HAServiceStateProto}} in {{NamespaceInfoProto}}. # For the unit test, can we set a very large heartbeat interval in configuration, and check the active NN is not null after {{cluster.waitForActive()}}? Mocked tests are useful as well and can be kept. Another idea is to drop heartbeat request against a spied HeartbeatManager. was (Author: liuml07): Hi [~ebadger], thanks for updating the patch. Sorry for returning late from holiday. The patch looks good to me overall. I have two thoughts for your consideration. # I discussed with [~arpitagarwal] and he suggest us consider using the same logic in {{updateActorStatesFromHeartbeat}} to update the active NN {{bpServiceToActive}}, which has dealt with several cases carefully. If we are updating {{bpServiceToActive}} we should likely also update {{lastActiveClaimTxId}}. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > ------------------------------------------------------------------------------------------- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Eric Badger > Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org