[jira] [Resolved] (CASSANDRA-3316) Add a JMX call to force cleaning repair sessions (in case they are hang up)

Sylvain Lebresne (Resolved) (JIRA) Thu, 03 Nov 2011 07:15:56 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne resolved CASSANDRA-3316.
-----------------------------------------

    Resolution: Fixed
      Reviewer: slebresne

+1, committed.

I don't think it's worth adding a nodetool command (more precisely I think it's 
a feature that it's not too easy to trigger this) because we don't expect 
people to use that hopefully. It's more to have a solution available if it 
comes to that.
                
> Add a JMX call to force cleaning repair sessions (in case they are hang up)
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3316
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3316
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.6
>            Reporter: Sylvain Lebresne
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.2
>
>         Attachments: 3316-v1.txt
>
>
> A repair session contains many parts, most of which are not local to the node 
> (implying the node waits on those operation). You request merkle trees, then 
> you schedule streaming (and in 1.0.0, some of the streaming don't involve the 
> local node itself). It's lots of place where something can go wrong, and if 
> so it leaves the repair hanging and as a consequence it leaves a 
> repairSessions tasks sitting active on the 'AntiEntropy Session' executor.
> Obviously, we should improve the detection by repair of those things that can 
> go wrong. CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much 
> of the remaining parts as possible, but my bet is that it will be hard to 
> cover everything (and it may not be worth of handling very improbable failure 
> scenario). Besides CASSANDRA-3112 will involve change in the wire protocol, 
> so it may take some time to be committed. In the meantime, it would be nice 
> to provide a JMX call to force terminating repairSessions so that you don't 
> end up in the case where you have enough 'zombie' sessions on the executor 
> that you can't submit new ones (you could restart the node but it's ugly). 
> Anyway, it's not a big issue but it would be simple to add such a JMX call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-3316) Add a JMX call to force cleaning repair sessions (in case they are hang up)

Reply via email to