Just assume that the rows you read in a page all end up in the heap at the same 
time

If you’re reading 1000 rows of 100 bytes, no big deal, you’ve got 100kb per 
read thread on the heap

If you’re reading 100 1mb rows, now you’ve got 100MB per thread on the heap

Assuming an 8gb heap with 2gb young gen size, the first example is probably no 
problem even with dozens of concurrent reads, but the second will trigger a 
young gc every 10-15 reads (could be promotion, depending on how many 
concurrent reads you’re doing). 




-- 
Jeff Jirsa


> On Jun 19, 2018, at 1:53 AM, Vsevolod Filaretov <vsfilare...@gmail.com> wrote:
> 
> Kurt, thank you very much for your answer! Your remark on GC totally changed 
> my thoughts on cassandra resources usage.
> 
> So.. more questions to the respective audience underway.
> 
> What is generally considered as 
> 
> 1) "too large" page size, 
> 2)"large" page size
> 3) "normal conditions" page size?
> 
> How exactly fetch size affects CPU? Can too large page size provoke severe 
> CPU usage for constant GC, thus affecting Cassandra performance on read 
> requests (because CPU basically doesn't work on other tasks, while it's 
> constantly GCing)?
> 
> Thank you all very much!
> 
> пн, 18 июн. 2018 г., 14:28 kurt greaves <k...@instaclustr.com>:
>>> 1) Am I correct to assume that the larger page size some user session has 
>>> set - the larger portion of cluster/coordinator node resources will be 
>>> hogged by the corresponding session?
>>> 2) Do I understand correctly that page size (imagine we have no timeout 
>>> settings) is limited by RAM and iops which I want to hand down to a single 
>>> user session?
>> Yes for both of the above. More rows will be pulled into memory 
>> simultaneously with a larger page size, thus using more memory and IO. 
>> 
>>> 3) Am I correct to assume that the page size/read request timeout allowance 
>>> I set is direct representation of chance to lock some node to single user's 
>>> requests?
>> Concurrent reads can occur on a node, so it shouldn't "lock" the node to a 
>> single users request. However you can overload the node, which may be 
>> effectively the same thing. Don't set page sizes too high, otherwise the 
>> coordinator of the query will end up doing a lot of GC. 
>> 
>> 

Reply via email to