[jira] [Comment Edited] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation

Jeremiah Jordan (JIRA) Thu, 30 Jun 2016 13:38:05 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357806#comment-15357806
 ]


Jeremiah Jordan edited comment on CASSANDRA-11738 at 6/30/16 8:36 PM:
----------------------------------------------------------------------

bq. Won't using correctly calculated latencies tell a node enough to avoid a 
given peer?

Yes, but the times I have used this are things like repairing after removing a 
corrupt SSTable or something.  Where latency may not have been high, but I 
didn't want the node to be picked for reads done at ONE unless the other nodes 
were down.


was (Author: jjordan):
bq. Won't using correctly calculated latencies tell a node enough to avoid a 
given peer?

Yes, but the times I have used this are things like repairing after removing a 
corrupt SSTable or something.  Where latency may not have been high, but I 
didn't want to node to be picked for reads done at ONE unless the other nodes 
were down.

> Re-think the use of Severity in the DynamicEndpointSnitch calculation
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-11738
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11738
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jeremiah Jordan
>             Fix For: 3.x
>
>
> CASSANDRA-11737 was opened to allow completely disabling the use of severity 
> in the DynamicEndpointSnitch calculation, but that is a pretty big hammer.  
> There is probably something we can do to better use the score.
> The issue seems to be that severity is given equal weight with latency in the 
> current code, also that severity is only based on disk io.  If you have a 
> node that is CPU bound on something (say catching up on LCS compactions 
> because of bootstrap/repair/replace) the IO wait can be low, but the latency 
> to the node is high.
> Some ideas I had are:
> 1. Allowing a yaml parameter to tune how much impact the severity score has 
> in the calculation.
> 2. Taking CPU load into account as well as IO Wait (this would probably help 
> in the cases I have seen things go sideways)
> 3. Move the -D from CASSANDRA-11737 to being a yaml level setting
> 4. Go back to just relying on Latency and get rid of severity all together.  
> Now that we have rapid read protection, maybe just using latency is enough, 
> as it can help where the predictive nature of IO wait would have been useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation

Reply via email to