Jian Zhang created HDFS-17198:
---------------------------------

             Summary: RBF: fix bug of getRepresentativeQuorum
                 Key: HDFS-17198
                 URL: https://issues.apache.org/jira/browse/HDFS-17198
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Jian Zhang


h2. *Bug description*

In the original implementation, when each router reports nn status at different 
times, the nn status is the status reported by majority routers, for example:
router1 -> nn0:active dateModified:1

router2 -> nn0:active dateModified:2

router3 -> nn0:active dateModified:3

router0 -> nn0:standby dateModified:4

Then, the status of nn0 is active, because majority routers report that nn0 is 
active.

If majority routers report nn status at the same time, for example:
(record1) router1 -> nn0:active dateModified:1

(record2) router2 -> nn0:active dateModified:1

(record3) router3 -> nn0:active dateModified:1

(record4) router0 -> nn0:standbydateModified:2

Then the state of nn0 is standby, but We expect the status of nn0 is active

This bug is because the above record is put into the Treeset in the method 
getRepresentativeQuorum. Since record1,2,3 have the same dateModified, there 
will only be one record in the final treeset of this method, so this method 
thinks that this nn is standby, because record4 newer
h2. *How to reproduce*

Running my unit test testRegistrationMajorityQuorumEqDateModified, but using 
the original code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to