Re: compaction strategy

Sylvain Lebresne Mon, 09 May 2011 01:21:25 -0700

On Sat, May 7, 2011 at 7:20 PM, Terje Marthinussen
<tmarthinus...@gmail.com> wrote:
> This is an all ssd system. I have no problems with read/write performance
> due to I/O.
> I do have a potential with the crazy explosion you can get in terms of disk
> use if compaction cannot keep up.
>
> As things falls behind and you get many generations of data, yes, read
> performance gets a problem due to the number of sstables.
>
> As things start falling behind, you have a bunch of minor compactions trying
> to merge 20MB (sstables cassandra generally dumps with current config when
> under pressure) into 40 MB into 80MB into....

Everyone may be well aware of that, but I'll still remark that a minor
compaction
will try to merge "as many 20MB sstables as it can" up to the max compaction
threshold (which is configurable). So if you do accumulate some newly created
sstable at some point in time, the next minor compaction will take all of them
and thus not create a 40 MB sstable, then 80MB etc... Sure there will be more
step than with a major compaction, but let's keep in mind we don't
merge sstables
2 by 2.

I'm also not too much in favor of triggering major compactions,
because it mostly
have a nasty effect (create one huge sstable). Now maybe we could expose the
difference factor for which we'll consider sstables in the same bucket
(i.e, of similar
size). As a side note, I think that
https://issues.apache.org/jira/browse/CASSANDRA-1610,
if done correctly, could help in such situation in that one could try
a strategy adapted
to it's work load.

>
> Anyone wants to do the math on how many times you are rewriting the data
> going this route?
>
> There is just no way this can keep up. It will just fall more and more
> behind.
> Only way to recover as I can see would be to trigger a full compaction?
>
> It does not really make sense to me to go through all these minor merges
> when a full compaction will do a much faster and better job.
>
> Terje
>
> On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
>> <tmarthinus...@gmail.com> wrote:
>> > 1. Would it make sense to make full compactions occur a bit more
>> > aggressive.
>>
>> I'd rather reduce the performance impact of being behind, than do more
>> full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498
>>
>> > 2. I
>> > would think the code should be smart enough to either trigger a full
>> > compaction and scrap the current queue, or at least merge some of those
>> > pending tasks into larger ones
>>
>> Not crazy but a queue-rewriter would be nontrivial. For now I'm okay
>> with saying "add capacity until compaction can mostly keep up." (Most
>> people's problem is making compaction LESS aggressive, hence
>> https://issues.apache.org/jira/browse/CASSANDRA-2156.)
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>

Re: compaction strategy

Reply via email to