ZhaoYang created CASSANDRA-15774:
------------------------------------

             Summary: Improve range reads to query by endpoints instead of 
vnodes to reduce number of remote requests
                 Key: CASSANDRA-15774
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15774
             Project: Cassandra
          Issue Type: Improvement
          Components: Legacy/Coordination
            Reporter: ZhaoYang


Currently, range read queries in batches, see 
{{StorageProxy.RangeCommandIterator#sendNextRequests()}}. For each batch, it 
computes a list of merged vnode ranges up to concurrency factor and query each 
merged vnode range asynchronously. (note: consecutive vnode ranges can be 
merged if they share enough replicas to satisfy consistency level requirement)

This works fine in general, but when concurrency factor is high because 
returned row count is small comparing to query limit or index filtering is 
used, coordinator may send too many concurrent remote range requests in a batch.

We can improve it by grouping remote range requests by endpoints where each 
endpoint will return response corresponding to multiple non-consecutive ranges. 
With endpoint grouping, number of remote range requests should largely reduced 
and it's always capped by number of nodes in the cluster instead of number of 
ranges which is capped by concurrency factor.

Let's look at an example on a 5-node cluster with 10 
ranges(a,b,c,d,e,f,g,h,i,h) and rf3.

Following is the range to replica mapping using round robin that should work 
well with consecutive range merger (consecutive range merger doesn't work well 
with fully random replica mapping, because it's less likely to have overlapping 
replicas for consecutive ranges)
{code:java}
   range-a replicas: 1, 2, 3
   range-b replicas: 2, 3, 4
   range-c replicas: 3, 4, 5
   range-d replicas: 1, 4, 5
   range-e replicas: 1, 2, 5
   range-f replicas: 1, 2, 3
   range-g replicas: 2, 3, 4
   range-h replicas: 3, 4, 5
   range-i replicas: 1, 4, 5
   range-j replicas: 1, 2, 5
{code}
With default range read implementation and consecutive range merger, we need 10 
replica read requests(2 for each merged range) for quorum:
{code:java}
     range (a,b] on node [2, 3]
     range (c,d] on node [4, 5]
     range (e,f] on node [1, 2]
     range (g,h] on node [3, 4]
     range (i,j] on node [1, 5]
{code}
With group query by endpoints, we only need 4 replica read requests for quorum:
{code:java}
    * node 1: a, d, e, f, i, j
    * node 2: a, b, e, f, g, j
    * node 3: b, c, g, h
    * node 4: c, d, h, i
{code}
 
Note that there are some complexities around short-read protection which needs 
to know whether replica has more rows available for current range.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to