Re: Cassandra crashes

Jan Algermissen Mon, 09 Sep 2013 17:18:09 -0700

Hi John,


On 10.09.2013, at 01:06, John Sanda <john.sa...@gmail.com> wrote:

> Check your file limits - 
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html

Did that already - without success.

Meanwhile I upgraded servers and I am getting closer.

I assume by now that heavy writes of rows with considerable size (as in: more 
than a couple of numbers) require a certain amount of RAM due to the C* 
architecture.

IOW, my through put limit is how fast I can get it to disk, but the minimal 
memory I need for that cannot be tuned down but depends on the size of the 
stuff written to C*. (Due to C* doing its memtable magic) to save using 
sequential IO.

It is an interesting trade off. (if I get it right by now :-)

Jan

> 
> On Friday, September 6, 2013, Jan Algermissen wrote:
> 
> On 06.09.2013, at 13:12, Alex Major <al3...@gmail.com> wrote:
> 
> > Have you changed the appropriate config settings so that Cassandra will run 
> > with only 2GB RAM? You shouldn't find the nodes go down.
> >
> > Check out this blog post 
> > http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
> >  , it outlines the configuration settings needed to run Cassandra on 64MB 
> > RAM and might give you some insights.
> 
> Yes, I have my fingers on the knobs and have also seen the article you 
> mention - very helpful indeed. As well as the replies so far. Thanks very 
> much.
> 
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my 
> data import :-(
> 
> Now, while it would be easy to scale out and up a bit until the default 
> config of C* is sufficient, I really like to dive deep and try to understand 
> why the thing is still going down, IOW, which of my config settings is so 
> darn wrong that in most cases kill -9 remains the only way to shutdown the 
> Java process in the end.
> 
> 
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and 
> HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that 
> demands too much heap, right?
> 
> So how do I find out what activity this is and how do I sufficiently reduce 
> that activity.
> 
> What bugs me in general is that AFAIU C* is so eager at giving massive write 
> speed, that it sort of forgets to protect itself from client demand. I would 
> very much like to understand why and how that happens.  I mean: no matter how 
> many clients are flooding the database, it should not die due to out of 
> memory situations, regardless of any configuration specifics, or?
> 
> 
> tl;dr
> 
> Currently my client side (with java-driver) after a while reports more and 
> more timeouts and then the following exception:
> 
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side: 
> java.lang.OutOfMemoryError: unable
> to create new native thread ;
> 
> On the server side, my cluster remains more or less in this condition:
> 
> DN  xxxxx     71,33 MB   256     34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  
> rack1
> UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  
> rack1
> UN  xxxxx    198,49 MB  256     33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  
> rack1
> 
> The host that is down (it is the seed host, if that matters) still shows the 
> running java process, but I cannot shut down cassandra or connect with 
> nodetool, hence kill -9 to the rescue.
> 
> In that host, I still see a load of around 1.
> 
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
> 
> 
> The system.log after a few seconds of import shows the following exception:
> 
> java.lang.AssertionError: incorrect row data size 771030 written to 
> /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; 
> correct is 771200
>         at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> 
> 
> And then, after about 2 minutes there are out of memory errors:
> 
>  ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java 
> (line 192) Exception in thread Thread[CompactionExecutor
> :5,1,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:693)
>         at 
> org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
>         at 
> org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
>         at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java 
> (line 192) Exception in thread Thread[CompactionExecutor:
> 
> 
> On the other hosts the log looks similar, but these keep running, desipte the 
> OutOfMemory Errors.
> 
> 
> 
> 
> Jan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >
> >
> > On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen 
> > <jan.algermis...@nordsc.com> wrote:
> > Hi,
> >
> > I have set up C* in a very limited environment: 3 VMs at digitalocean with 
> > 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.
> >
> > Keyspace uses replication level of 2.
> >
> > I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small 
> > texts, 300.000 wide rows effektively) in a quite 'agressive' way, using 
> > java-driver and async update statements.
> >
> > After a while of importing data, I start seeing timeouts reported by the 
> > driver:
> >
> > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra 
> > timeout during write query at consistency ONE (1 replica were required but 
> > only 0 acknowledged the write
> >
> > and then later, host-unavailability exceptions:
> >
> > com.datastax.driver.core.exceptions.UnavailableException: Not enough 
> > replica available for query at consistency ONE (1 required but only 0 
> > alive).
> >
> > Looking at the 3 hosts, I see two C*s went down - which explains that I 
> > still see some writes succeeding (that must be the one host left, 
> > satisfying the consitency level ONE).
> >
> >
> > The logs tell me AFAIU that the servers shutdown due to reaching the heap 
> > size limit.
> >
> > I am irritated by the fact that the instances (it seems) shut themselves 
> > down instead of limiting their amount of work. I understand that I need to 
> > tweak the configuration and likely get more RAM, but still, I would 
> > actually be satisfied with reduced service (and likely more timeouts in the 
> > client).  Right now it looks as if I would have to slow down the client 
> > 'artificially'  to prevent the loss of hosts - does that make sense?
> >
> > Can anyone explain whether this is intended behavior, meaning I'll just 
> > have to accept the self-shutdown of the hosts? Or alternatively, what data 
> > I should collect to investigate the cause further?
> >
> > Jan
> >
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> 
> - John

Re: Cassandra crashes

Reply via email to