I only sifted recent history of this thread (for time reasons), but:

> You have started a major compaction which is now competing with those
> near constant minor compactions for far too little I/O (3 SATA drives
> in RAID0, perhaps?).  Normally, this would result in a massive
> ballooning of your heap use as all sorts of activities (like memtable
> flushes) backed up, as well.

AFAIK memtable flushing is unrelated to compaction in the sense that
they occur concurrently and don't block each other (except to the
extent that they truly do compete for e.g. disk or CPU resources).

While small memtables do indeed mean more compaction activity in
total, the expensiveness of any given compaction should not be
severely affecting.

As far as I can tell, the two primary effects of small memtable sizes are:

* An increase in total amount of compaction work done in total for a
given database size.
* An increase in the number of sstables that may accumulate while
larger compactions are running.
** That in turn is particularly relevant because it can generate a lot
of seek-bound activity; consider for example range queries that end up
spanning 10 000 files on disk.

If memtable flushes are not able to complete fast enough to cope with
write activity, even if that is the case only during concurrenct
compaction (for whatever reason), that suggests to me that write
activity is too high. Increasing memtable sizes may help on average
due to decreased compaction work, but I don't see why it would
significantly affect the performance one compactions *do* in fact run.

With respect to timeouts on writes: I make no claims as to whether it
is expected, because I have not yet investigated, but I definitely see
sporadic slowness when benchmarking high-throughput writes on a
cassandra trunk snapshot somewhere between 0.6 and 0.7. This occurs
even when writing to a machine where the commit log and data
directories are both on separate RAID volumes that are battery backed
and should have no trouble eating write bursts (and the data is such
that one is CPU bound  rather than diskbound on average; so it only
needs to eat bursts).

I've had to add re-try to the benchmarking tool (or else up the
timeout) because the default was not enough.

I have not investigated exactly why this happens but it's an
interesting effect that as far as I can tell should not be there.
Haver other people done high-throughput writes (to the point of CPU
saturation) over extended periods of time while consistently seeing
low latencies (consistencty meaning never exceeding hundreds of ms
over several days)?


-- 
/ Peter Schuller

Reply via email to