[ https://issues.apache.org/jira/browse/ACCUMULO-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037441#comment-14037441 ]
Mike Drob commented on ACCUMULO-2868: ------------------------------------- Todd outlines some more [advanced logic|https://issues.apache.org/jira/browse/HDFS-599?focusedCommentId=12756258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12756258] for HDFS deciding when to mark a node as dead, rather than just X retries * Y seconds. > Make master configurable in when it kills tablet servers > -------------------------------------------------------- > > Key: ACCUMULO-2868 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2868 > Project: Accumulo > Issue Type: Improvement > Components: master > Affects Versions: 1.6.0 > Reporter: Bill Havanki > Labels: admin, configuration, master > > On a cluster with a flaky network, the master may be unable to contact a > tserver for some moderate amount of time and then direct it to terminate, > even though the tserver is still up. (See {{gatherTableInformation()}} and > {{StatusThread}}. It does not appear possible to configure the master to be > more forgiving in these checks. Relevant constants: > * {{DEFAULT_WAIT_FOR_WATCHER}} - interval between server checks > * {{MAX_BAD_STATUS_COUNT}} - the maximum number of failed attempts allowed > before killing the tserver > Making one or both of those configurable, or some other pertinent parameter > configurable, would allow cluster admins to cope with mild network maladies. -- This message was sent by Atlassian JIRA (v6.2#6252)