[jira] [Issue Comment Edited] (CASSANDRA-3316) Add a JMX call to force cleaning repair sessions (in case they are hang up)

Yuki Morishita (Issue Comment Edited) (JIRA) Thu, 03 Nov 2011 05:39:59 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143078#comment-13143078
 ]


Yuki Morishita edited comment on CASSANDRA-3316 at 11/3/11 12:38 PM:
---------------------------------------------------------------------

First attempt. Added JMX interface (forceTerminateAllRepairSessions) to ss.
Patch attached for cassandra-1.0 branch.

I think it would be better if there is nodetool cmd for this feature. How about 
nodetool cleanuprepair?
                
      was (Author: yukim):
    First attempt. Added JMX interface (forceTerminateAllRepairSessions) to ss.

I think it would be better if there is nodetool cmd for this feature. How about 
nodetool cleanuprepair?
                  
> Add a JMX call to force cleaning repair sessions (in case they are hang up)
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3316
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3316
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.6
>            Reporter: Sylvain Lebresne
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.2
>
>         Attachments: 3316-v1.txt
>
>
> A repair session contains many parts, most of which are not local to the node 
> (implying the node waits on those operation). You request merkle trees, then 
> you schedule streaming (and in 1.0.0, some of the streaming don't involve the 
> local node itself). It's lots of place where something can go wrong, and if 
> so it leaves the repair hanging and as a consequence it leaves a 
> repairSessions tasks sitting active on the 'AntiEntropy Session' executor.
> Obviously, we should improve the detection by repair of those things that can 
> go wrong. CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much 
> of the remaining parts as possible, but my bet is that it will be hard to 
> cover everything (and it may not be worth of handling very improbable failure 
> scenario). Besides CASSANDRA-3112 will involve change in the wire protocol, 
> so it may take some time to be committed. In the meantime, it would be nice 
> to provide a JMX call to force terminating repairSessions so that you don't 
> end up in the case where you have enough 'zombie' sessions on the executor 
> that you can't submit new ones (you could restart the node but it's ugly). 
> Anyway, it's not a big issue but it would be simple to add such a JMX call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3316) Add a JMX call to force cleaning repair sessions (in case they are hang up)

Reply via email to