I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB unique data), to a cluster of 10 servers. I'm using batch_mutate, and breaking the data up into chunks of about 10k records. Each record is about 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data set, everything works fine. When I upload the 20 GB data set, servers will occasionally crash. Currently I have my client code automatically detect this and restart the server, but that is less than ideal.
I'm not sure what information to gather to determine what's going on here. Here is a sample of a log file from when a crash occurred. The crash was immediately after the log entry tagged "2011-05-12 19:02:19,377". Any idea what's going on here? Any other info I can gather to try to debug this? INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 7774142464 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 50) Creating new commitlog segment /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 7774142464 INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) Completed flushing /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 bytes) INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) Discarding obsolete commit log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log) INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 7774142464 INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61 INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) Logging initialized INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) Heap size: 7634681856/7635730432 INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. Native methods will be disabled. INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) Loading settings from file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1 INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2 INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1 INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2 INFO [main] 2011-05-12 19:02:21,342 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-2 INFO [main] 2011-05-12 19:02:21,344 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-1 INFO [main] 2011-05-12 19:02:21,379 DatabaseDescriptor.java (line 461) Loading schema version 9467ffe0-7cea-11e0-8ddc-f74ef74e382f