[ 
https://issues.apache.org/jira/browse/HADOOP-12569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HADOOP-12569:
-----------------------------
    Description: 
We have met such a HA scenario:
NN1(active) and zkfc1 on node1;
NN2(standby) and zkfc2 on node2.
1,Stop network on node1, NN2 becomes active. On node1, zkfc1 kills itself since 
it cannot connect to zookeeper, but leaving NN1 still running.
2,Several minutes later, network on node1 recovers. NN1 is running but out of 
control. NN1 and NN2 both run as active nn.
Maybe zkfc should stop nn before quit in such circumstances.


  was:
We have met such a HA scenario:
NN1(active) and zkfc1 on node1;
NN2(standby) and zkfc2 on node2.
1,Stop network on node1, NN2 becomes active. On node2, zkfc2 kills itself since 
it cannot connect to zookeeper, but leaving NN1 still running.
2,Several minutes later, network on node1 recovers. NN1 is running but out of 
control. NN1 and NN2 both run as active nn.
Maybe zkfc should stop nn before quit in such circumstances.



> ZKFC should stop namenode before itself quit in some circumstances
> ------------------------------------------------------------------
>
>                 Key: HADOOP-12569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12569
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.6.0
>            Reporter: Tao Jie
>
> We have met such a HA scenario:
> NN1(active) and zkfc1 on node1;
> NN2(standby) and zkfc2 on node2.
> 1,Stop network on node1, NN2 becomes active. On node1, zkfc1 kills itself 
> since it cannot connect to zookeeper, but leaving NN1 still running.
> 2,Several minutes later, network on node1 recovers. NN1 is running but out of 
> control. NN1 and NN2 both run as active nn.
> Maybe zkfc should stop nn before quit in such circumstances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to