>From a testing whether cassandra can take the load long term I do not see it as different. Yes bulk loading can be made faster using very different methods, but my purpose is to test cassandra with a large volume of writes (and not to bulk load as efficiently as possible). I have scaled back to 5 writer threads per node and still see 8k writes/sec/node. With the larger memory table settings we shall see how it goes. I have no idea how to change a JMX setting and prefer to use std options to be frank. For us this is after all an evaluation of whether Cassandra can replace Mysql.
I thank everyone for their help. On Sun, Aug 22, 2010 at 10:37 PM, Benjamin Black <b...@b3k.us> wrote: > Wayne, > > Bulk loading this much data is a very different prospect from needing > to sustain that rate of updates indefinitely. As was suggested > earlier, you likely need to tune things differently, including > disabling minor compactions during the bulk load, to make this work > efficiently. > > > b > > On Sun, Aug 22, 2010 at 12:40 PM, Wayne <wav...@gmail.com> wrote: > > Has anyone loaded 2+ terabytes of real data in one stretch into a cluster > > without bulk loading and without any problems? How long did it take? What > > kind of nodes were used? How many writes/sec/node can be sustained for > 24+ > > hours? > > > > > > > > On Sun, Aug 22, 2010 at 8:22 PM, Peter Schuller > > <peter.schul...@infidyne.com> wrote: > >> > >> I only sifted recent history of this thread (for time reasons), but: > >> > >> > You have started a major compaction which is now competing with those > >> > near constant minor compactions for far too little I/O (3 SATA drives > >> > in RAID0, perhaps?). Normally, this would result in a massive > >> > ballooning of your heap use as all sorts of activities (like memtable > >> > flushes) backed up, as well. > >> > >> AFAIK memtable flushing is unrelated to compaction in the sense that > >> they occur concurrently and don't block each other (except to the > >> extent that they truly do compete for e.g. disk or CPU resources). > >> > >> While small memtables do indeed mean more compaction activity in > >> total, the expensiveness of any given compaction should not be > >> severely affecting. > >> > >> As far as I can tell, the two primary effects of small memtable sizes > are: > >> > >> * An increase in total amount of compaction work done in total for a > >> given database size. > >> * An increase in the number of sstables that may accumulate while > >> larger compactions are running. > >> ** That in turn is particularly relevant because it can generate a lot > >> of seek-bound activity; consider for example range queries that end up > >> spanning 10 000 files on disk. > >> > >> If memtable flushes are not able to complete fast enough to cope with > >> write activity, even if that is the case only during concurrenct > >> compaction (for whatever reason), that suggests to me that write > >> activity is too high. Increasing memtable sizes may help on average > >> due to decreased compaction work, but I don't see why it would > >> significantly affect the performance one compactions *do* in fact run. > >> > >> With respect to timeouts on writes: I make no claims as to whether it > >> is expected, because I have not yet investigated, but I definitely see > >> sporadic slowness when benchmarking high-throughput writes on a > >> cassandra trunk snapshot somewhere between 0.6 and 0.7. This occurs > >> even when writing to a machine where the commit log and data > >> directories are both on separate RAID volumes that are battery backed > >> and should have no trouble eating write bursts (and the data is such > >> that one is CPU bound rather than diskbound on average; so it only > >> needs to eat bursts). > >> > >> I've had to add re-try to the benchmarking tool (or else up the > >> timeout) because the default was not enough. > >> > >> I have not investigated exactly why this happens but it's an > >> interesting effect that as far as I can tell should not be there. > >> Haver other people done high-throughput writes (to the point of CPU > >> saturation) over extended periods of time while consistently seeing > >> low latencies (consistencty meaning never exceeding hundreds of ms > >> over several days)? > >> > >> > >> -- > >> / Peter Schuller > > > > >