[ https://issues.apache.org/jira/browse/CASSANDRA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143078#comment-13143078 ]
Yuki Morishita edited comment on CASSANDRA-3316 at 11/3/11 12:38 PM: --------------------------------------------------------------------- First attempt. Added JMX interface (forceTerminateAllRepairSessions) to ss. Patch attached for cassandra-1.0 branch. I think it would be better if there is nodetool cmd for this feature. How about nodetool cleanuprepair? was (Author: yukim): First attempt. Added JMX interface (forceTerminateAllRepairSessions) to ss. I think it would be better if there is nodetool cmd for this feature. How about nodetool cleanuprepair? > Add a JMX call to force cleaning repair sessions (in case they are hang up) > --------------------------------------------------------------------------- > > Key: CASSANDRA-3316 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3316 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.8.6 > Reporter: Sylvain Lebresne > Assignee: Yuki Morishita > Priority: Minor > Fix For: 1.0.2 > > Attachments: 3316-v1.txt > > > A repair session contains many parts, most of which are not local to the node > (implying the node waits on those operation). You request merkle trees, then > you schedule streaming (and in 1.0.0, some of the streaming don't involve the > local node itself). It's lots of place where something can go wrong, and if > so it leaves the repair hanging and as a consequence it leaves a > repairSessions tasks sitting active on the 'AntiEntropy Session' executor. > Obviously, we should improve the detection by repair of those things that can > go wrong. CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much > of the remaining parts as possible, but my bet is that it will be hard to > cover everything (and it may not be worth of handling very improbable failure > scenario). Besides CASSANDRA-3112 will involve change in the wire protocol, > so it may take some time to be committed. In the meantime, it would be nice > to provide a JMX call to force terminating repairSessions so that you don't > end up in the case where you have enough 'zombie' sessions on the executor > that you can't submit new ones (you could restart the node but it's ugly). > Anyway, it's not a big issue but it would be simple to add such a JMX call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira