Hi John,
On 10.09.2013, at 01:06, John Sanda <john.sa...@gmail.com> wrote: > Check your file limits - > http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html Did that already - without success. Meanwhile I upgraded servers and I am getting closer. I assume by now that heavy writes of rows with considerable size (as in: more than a couple of numbers) require a certain amount of RAM due to the C* architecture. IOW, my through put limit is how fast I can get it to disk, but the minimal memory I need for that cannot be tuned down but depends on the size of the stuff written to C*. (Due to C* doing its memtable magic) to save using sequential IO. It is an interesting trade off. (if I get it right by now :-) Jan > > On Friday, September 6, 2013, Jan Algermissen wrote: > > On 06.09.2013, at 13:12, Alex Major <al3...@gmail.com> wrote: > > > Have you changed the appropriate config settings so that Cassandra will run > > with only 2GB RAM? You shouldn't find the nodes go down. > > > > Check out this blog post > > http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/ > > , it outlines the configuration settings needed to run Cassandra on 64MB > > RAM and might give you some insights. > > Yes, I have my fingers on the knobs and have also seen the article you > mention - very helpful indeed. As well as the replies so far. Thanks very > much. > > However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my > data import :-( > > Now, while it would be easy to scale out and up a bit until the default > config of C* is sufficient, I really like to dive deep and try to understand > why the thing is still going down, IOW, which of my config settings is so > darn wrong that in most cases kill -9 remains the only way to shutdown the > Java process in the end. > > > The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M" and > HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that > demands too much heap, right? > > So how do I find out what activity this is and how do I sufficiently reduce > that activity. > > What bugs me in general is that AFAIU C* is so eager at giving massive write > speed, that it sort of forgets to protect itself from client demand. I would > very much like to understand why and how that happens. I mean: no matter how > many clients are flooding the database, it should not die due to out of > memory situations, regardless of any configuration specifics, or? > > > tl;dr > > Currently my client side (with java-driver) after a while reports more and > more timeouts and then the following exception: > > com.datastax.driver.core.ex > ceptions.DriverInternalError: An unexpected error occured server side: > java.lang.OutOfMemoryError: unable > to create new native thread ; > > On the server side, my cluster remains more or less in this condition: > > DN xxxxx 71,33 MB 256 34,1% 2f5e0b70-dbf4-4f37-8d5e-746ab76efbae > rack1 > UN xxxxx 189,38 MB 256 32,0% e6d95136-f102-49ce-81ea-72bd6a52ec5f > rack1 > UN xxxxx 198,49 MB 256 33,9% 0c2931a9-6582-48f2-b65a-e406e0bf1e56 > rack1 > > The host that is down (it is the seed host, if that matters) still shows the > running java process, but I cannot shut down cassandra or connect with > nodetool, hence kill -9 to the rescue. > > In that host, I still see a load of around 1. > > jstack -F lists 892 threads, all blocked, except for 5 inactive ones. > > > The system.log after a few seconds of import shows the following exception: > > java.lang.AssertionError: incorrect row data size 771030 written to > /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; > correct is 771200 > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) > at > org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > And then, after about 2 minutes there are out of memory errors: > > ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java > (line 192) Exception in thread Thread[CompactionExecutor > :5,1,main] > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:693) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73) > at > org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java > (line 192) Exception in thread Thread[CompactionExecutor: > > > On the other hosts the log looks similar, but these keep running, desipte the > OutOfMemory Errors. > > > > > Jan > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen > > <jan.algermis...@nordsc.com> wrote: > > Hi, > > > > I have set up C* in a very limited environment: 3 VMs at digitalocean with > > 2GB RAM and 40GB SSDs, so my expectations about overall performance are low. > > > > Keyspace uses replication level of 2. > > > > I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small > > texts, 300.000 wide rows effektively) in a quite 'agressive' way, using > > java-driver and async update statements. > > > > After a while of importing data, I start seeing timeouts reported by the > > driver: > > > > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra > > timeout during write query at consistency ONE (1 replica were required but > > only 0 acknowledged the write > > > > and then later, host-unavailability exceptions: > > > > com.datastax.driver.core.exceptions.UnavailableException: Not enough > > replica available for query at consistency ONE (1 required but only 0 > > alive). > > > > Looking at the 3 hosts, I see two C*s went down - which explains that I > > still see some writes succeeding (that must be the one host left, > > satisfying the consitency level ONE). > > > > > > The logs tell me AFAIU that the servers shutdown due to reaching the heap > > size limit. > > > > I am irritated by the fact that the instances (it seems) shut themselves > > down instead of limiting their amount of work. I understand that I need to > > tweak the configuration and likely get more RAM, but still, I would > > actually be satisfied with reduced service (and likely more timeouts in the > > client). Right now it looks as if I would have to slow down the client > > 'artificially' to prevent the loss of hosts - does that make sense? > > > > Can anyone explain whether this is intended behavior, meaning I'll just > > have to accept the self-shutdown of the hosts? Or alternatively, what data > > I should collect to investigate the cause further? > > > > Jan > > > > > > > > > > > > > > > > -- > > - John