Benjamin Lerer created CASSANDRA-17183:
------------------------------------------

             Summary: Using the user specified page size for internal paging in 
GROUP BY queries can slow down the query and create high traffic between nodes
                 Key: CASSANDRA-17183
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17183
             Project: Cassandra
          Issue Type: Bug
            Reporter: Benjamin Lerer


When performing aggregation queries or GROUP BY queries Cassandra compute the 
aggregates on the coordinator node to ensure consistency and request the data 
by pages (numbers of rows). Today, Cassandra use as internal page size the page 
size requested by the user (the number of rows that should be returned to the 
user). By consequence, if the page size requested by the user is too small the 
number of request performed by the node will be much higher.

For 1,000,000 rows, a consistency level of LOCAL_QUORUM and a page size of 
5,000 the coordinator will contact 200 times the replicas. For a page size of 
100 (CQLSH page size) the coordinator will contact 10,000 times the replicas.

To avoid this problem we should have a minimum page size for the internal 
paging and the possibility for the operators to change its value.

  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to