The key JVM options for Cassandra are in cassandra.in.sh. What is your min and max heap size?
The default setting of max heap size is 1GB. How much RAM do your nodes have? You may want to increase this setting. You can also set the -Xmx and -Xms options to the same value to keep Java from having to manage heap growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a lot more on 64-bit. Try messing with some of the other settings in the cassandra.in.sh file. You may not have DEBUG mode turned on for Cassandra and therefore may not be getting the full details of what's going on when the server crashes. In the <cassandra-home>/conf/log4j-server.properties file, set this line from the default of INFO to DEBUG: log4j.rootLogger=INFO,stdout,R Also, you haven't configured JNA on this server. Here's some info about it and how to configure it: JNA provides Java programs easy access to native shared libraries without writing anything but Java code. Note from Cassandra developers for why JNA is needed: "*Linux aggressively swaps out infrequently used memory to make more room for its file system buffer cache. Unfortunately, modern generational garbage collectors like the JVM's leave parts of its heap un-touched for relatively large amounts of time, leading Linux to swap it out. When the JVM finally goes to use or GC that memory, swap hell ensues. Setting swappiness to zero can mitigate this behavior but does not eliminate it entirely. Turning off swap entirely is effective. But to avoid surprising people who don't know about this behavior, the best solution is to tell Linux not to swap out the JVM, and that is what we do now with mlockall via JNA. Because of licensing issues, we can't distribute JNA with Cassandra, so you must manually add it to the Cassandra lib/ directory or otherwise place it on the classpath. If the JNA jar is not present, Cassandra will continue as before.*" Get JNA with: *cd ~ wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb* To install: *techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb (Reading database ... 44334 files and directories currently installed.) Preparing to replace libjna-java 3.2.4-2 (using libjna-java_3.2.7-0~nmu.2_amd64.deb) ... Unpacking replacement libjna-java ... Setting up libjna-java (3.2.7-0~nmu.2) ...* The deb package will install the JNA jar file to /usr/share/java/jna.jar, but Cassandra only loads it if its in the class path. The easy way to do this is just create a symlink into your Cassandra lib directory (note: replace /home/techlabs with your home dir location): *ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib* Research: http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/ - Sameer On Thu, May 12, 2011 at 4:15 PM, James Cipar <jci...@cmu.edu> wrote: > I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB > unique data), to a cluster of 10 servers. I'm using batch_mutate, and > breaking the data up into chunks of about 10k records. Each record is about > 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data > set, everything works fine. When I upload the 20 GB data set, servers will > occasionally crash. Currently I have my client code automatically detect > this and restart the server, but that is less than ideal. > > I'm not sure what information to gather to determine what's going on here. > Here is a sample of a log file from when a crash occurred. The crash was > immediately after the log entry tagged "2011-05-12 19:02:19,377". Any idea > what's going on here? Any other info I can gather to try to debug this? > > > > > > > > INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line > 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max > is 7774142464 > INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line > 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max > is 7774142464 > INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java > (line 50) Creating new commitlog segment > /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log > INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java > (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 > bytes, 1115783 operations) > INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) > Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) > INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line > 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max > is 7774142464 > INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line > 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max > is 7774142464 > INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) > Completed flushing > /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db > (53646223 bytes) > INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) > Discarding obsolete commit > log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log) > INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line > 128) GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max > is 7774142464 > INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java > (line 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 > bytes, 1115783 operations) > INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) > Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61 > INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) > Logging initialized > INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) > Heap size: 7634681856/7635730432 > INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. > Native methods will be disabled. > INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) > Loading settings from > file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml > INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) > DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap > INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1 > INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2 > INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1 > INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2 > INFO [main] 2011-05-12 19:02:21,342 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-2 > INFO [main] 2011-05-12 19:02:21,344 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-1 > INFO [main] 2011-05-12 19:02:21,379 DatabaseDescriptor.java (line 461) > Loading schema version 9467ffe0-7cea-11e0-8ddc-f74ef74e382f