[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439573#comment-13439573 ] Hadoop QA commented on HDFS-3561: - -1 overall. Here are the results of testing the latest

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439706#comment-13439706 ] Aaron T. Myers commented on HDFS-3561: -- +1, the latest patch looks good to me. Vinay,

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439750#comment-13439750 ] Vinay commented on HDFS-3561: - Yes Aaron, We tested the described scenario after setting number

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439755#comment-13439755 ] Aaron T. Myers commented on HDFS-3561: -- Great! Thanks for doing that. I'm going to

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-21 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439100#comment-13439100 ] Aaron T. Myers commented on HDFS-3561: -- Sounds good, Vinay. I'll be happy to

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-20 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438463#comment-13438463 ] Vinay commented on HDFS-3561: - Thanks Aaron, I agree with your preference. I will post a new

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-17 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437160#comment-13437160 ] Aaron T. Myers commented on HDFS-3561: -- That's a good point, Vinay, that the method

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-13 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432946#comment-13432946 ] Vinay commented on HDFS-3561: - Hi [~atm] any more comments you have on this..?

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-14 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414316#comment-13414316 ] Vinay commented on HDFS-3561: - That sounds good. But as of now, in ZKFC, tryGracefulFence() is

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-13 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413654#comment-13413654 ] Hadoop QA commented on HDFS-3561: - -1 overall. Here are the results of testing the latest

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-13 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414093#comment-13414093 ] Aaron T. Myers commented on HDFS-3561: -- Instead of creating a new Configuration object

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-10 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410218#comment-13410218 ] Vinay commented on HDFS-3561: - Thanks Aaron for the suggestion. I have one question here.

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-10 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410903#comment-13410903 ] Aaron T. Myers commented on HDFS-3561: -- I'd think that we'd only want the lower number

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-09 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409927#comment-13409927 ] Aaron T. Myers commented on HDFS-3561: -- Seems to me like these new configs should not

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-06 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13408255#comment-13408255 ] Uma Maheswara Rao G commented on HDFS-3561: --- How about the configuration key name

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-06 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13408385#comment-13408385 ] Hadoop QA commented on HDFS-3561: - -1 overall. Here are the results of testing the latest

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405151#comment-13405151 ] Aaron T. Myers commented on HDFS-3561: -- bq. How we can do shared storage fencing from

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405222#comment-13405222 ] Uma Maheswara Rao G commented on HDFS-3561: --- Hi Aaron, Thanks a lot. {code}

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405227#comment-13405227 ] Aaron T. Myers commented on HDFS-3561: -- Ah, yes. Both in the case of BKJM or the QJM,

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405238#comment-13405238 ] Uma Maheswara Rao G commented on HDFS-3561: --- Thanks Aaron :-) {quote}

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-29 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404285#comment-13404285 ] Aaron T. Myers commented on HDFS-3561: -- I think some wires are getting crossed here.

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-29 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404354#comment-13404354 ] Uma Maheswara Rao G commented on HDFS-3561: --- Hi Aaron, Thanks a lot for the

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-29 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404358#comment-13404358 ] Uma Maheswara Rao G commented on HDFS-3561: --- Hi Aaron, Thanks a lot for the

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-28 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402978#comment-13402978 ] Vinay commented on HDFS-3561: - {quote}This isn't acceptable. The point of fencing is to ensure

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-28 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402994#comment-13402994 ] Uma Maheswara Rao G commented on HDFS-3561: --- Yes, we have multiple level of

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400487#comment-13400487 ] Vinay commented on HDFS-3561: - During transition, fencing of old active will be done. Here

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400504#comment-13400504 ] Uma Maheswara Rao G commented on HDFS-3561: --- I think we can set retries to 1/2

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400833#comment-13400833 ] Aaron T. Myers commented on HDFS-3561: -- bq. Suggestion: If ZKFC is not able to reach

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400847#comment-13400847 ] Todd Lipcon commented on HDFS-3561: --- +1 for setting it to 0 or 1 for the graceful fence