Re: Calculate number of nodes required based on data

Hefeng Yuan Wed, 07 Sep 2011 11:10:09 -0700

We didn't change MemtableThroughputInMB/min/maxCompactionThreshold, they're 
499/4/32.
As for why we're flushing at ~9m, I guess it has to do with this: 
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
The only parameter I tried to play with is the 
compaction_throughput_mb_per_sec, tried cutting it in half and doubled, seems 
none of them helps avoiding the simultaneous compactions on nodes.


I agree that we don't necessarily need to add node, as long as we have a way to 
avoid simultaneous compaction on 4+ nodes.

Thanks,
Hefeng

On Sep 7, 2011, at 10:51 AM, Adi wrote:

> 
> On Wed, Sep 7, 2011 at 1:09 PM, Hefeng Yuan <hfy...@rhapsody.com> wrote:
> Adi,
> 
> The reason we're attempting to add more nodes is trying to solve the 
> long/simultaneous compactions, i.e. the performance issue, not the storage 
> issue yet.
> We have RF 5 and CL QUORUM for read and write, we have currently 6 nodes, and 
> when 4 nodes doing compaction at the same period, we're screwed, especially 
> on read, since it'll cover one of the compaction node anyways. 
> My assumption is that if we add more nodes, each node will have less load, 
> and therefore need less compaction, and probably will compact faster, 
> eternally avoid 4+ nodes doing compaction simultaneously.
> 
> Any suggestion on how to calculate how many more nodes to add? Or, generally 
> how to plan for number of nodes required, from a performance perspective?
> 
> Thanks,
> Hefeng
> 
> 
> 
> Adding nodes to delay and reduce compaction is an interesting performance use 
> case :-)  I am thinking you can find a smarter/cheaper way to manage that.
> Have you looked at 
> a) increasing memtable througput
> What is the nature of your writes?  Is it mostly inserts or also has lot of 
> quick updates of recently inserted data. Increasing memtable_throughput can 
> delay and maybe reduce the compaction cost if you have lots of updates to 
> same data.You will have to provide for memory if you try this. 
> When mentioned "with ~9m serialized bytes" is that the memtable throughput? 
> That is quite a low threshold which will result in large number of SSTables 
> needing to be compacted. I think the default is 256 MB and on the lower end 
> values I have seen are 64 MB or maybe 32 MB.
> 
> 
> b) tweaking min_compaction_threshold and max_compaction_threshold
> - increasing min_compaction_threshold will delay compactions
> - decreasing max_compaction_threshold will reduce number of sstables per 
> compaction cycle
> Are you using the defaults 4-32 or are trying some different values
> 
> c) splitting column families
> Again splitting column families can also help because compactions occur 
> serially one CF at a time and that spreads out your compaction cost over time 
> and column families. It requires change in app logic though.
> 
> -Adi
>

Re: Calculate number of nodes required based on data

Reply via email to