Hi,

We are running 4 nodes Cassandra cluster (1.1.4) with Replica Factor 2 (DC 1) and Replica Factor 1 (DC 2) in two differnet data cnters with network topology. Our machines are having 16GB RAM and 8 core with two hard drives.

# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
Address DC Rack Status State Load Effective-Ownership Token 169417178424467235000914166253263322299 10.0.0.3 DC1 RAC1 Up Normal 91.93 GB 66.67% 0 10.0.0.4 DC1 RAC1 Up Normal 84.88 GB 66.67% 56713727820156410577229101238628035242 10.0.0.15 DC1 RAC1 Up Normal 82.51 GB 66.67% 113427455640312821154458202477256070484 10.40.1.103 DC2 RAC1 Up Normal 303.2 MB 100.00% 169417178424467235000914166253263322299

# java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

After some time (1 hour / 2 hour) cassandra shut services on one or two nodes with follwoing errors;

============================================================
INFO 11:01:25,527 GC for ConcurrentMarkSweep: 1968 ms for 2 collections, 3817667464 used; max is 4093640704 INFO 11:01:42,838 GC for ConcurrentMarkSweep: 1828 ms for 2 collections, 3850830504 used; max is 4093640704
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid27363.hprof ...
Heap dump file created [4664912349 bytes in 44.731 secs]
ERROR 11:02:41,156 Exception in thread Thread[CompactionExecutor:87,1,main]
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.io.util.FastByteArrayOutputStream.expand(FastByteArrayOutputStream.java:104) at org.apache.cassandra.io.util.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:220)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.cassandra.io.util.DataOutputBuffer.write(DataOutputBuffer.java:61) at org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:328) at org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:315) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:62) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:366) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:339) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:89) at org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:138) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)
 INFO 11:02:41,373 Stop listening to thrift clients
 INFO 11:02:41,376 InetAddress /10.0.0.15 is now dead.
 INFO 11:02:41,376 InetAddress /10.0.0.3 is now dead.
 INFO 11:02:41,377 InetAddress /10.40.1.103 is now dead.
 INFO 11:02:41,397 InetAddress /10.0.0.3 is now UP
 INFO 11:02:41,397 InetAddress /10.0.0.15 is now UP
 INFO 11:02:41,398 InetAddress /10.40.1.103 is now UP
 INFO 11:02:41,398 Started hinted handoff for token: 0 with IP: /10.0.0.3
 INFO 11:02:41,450 Announcing shutdown
INFO 11:02:48,184 GC for ConcurrentMarkSweep: 1887 ms for 2 collections, 2234362128 used; max is 4093640704
 INFO 11:02:48,206 Waiting for messaging service to quiesce
 INFO 11:02:48,207 MessagingService shutting down server thread.
============================================================

Our cassandra.yaml configurations are as under;

============================================================
cluster_name: 'ABC Cluster'
initial_token: 0
hinted_handoff_enabled: true
max_hint_window_in_ms: 2147483647 # one hour
hinted_handoff_throttle_delay_in_ms: 0
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner

data_file_directories:
    - /u/cassandra/data

commitlog_directory: /var/log/cassandra/commitlog
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
row_cache_provider: SerializingCacheProvider
saved_caches_directory: /var/log/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32

seed_provider:
          # Ex: "<ip1>,<ip2>,<ip3>"
          - seeds: "10.0.0.3,10.0.0.4"

flush_largest_memtables_at: 1.0
reduce_cache_sizes_at: 1.0
reduce_cache_capacity_to: 0.6
concurrent_reads: 8
concurrent_writes: 32
memtable_flush_queue_size: 4
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.0.0.3
rpc_address: 10.0.0.3
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
rpc_min_threads: 16
rpc_max_threads: 2147483647
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 256
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 15000
phi_convict_threshold: 8
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.0
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: cassandra
    truststore: conf/.truststore
    truststore_password: cassandra
============================================================

Please help me to fix this issue permanently and smooth running of Cassandra nodes.

Regards,

Adeel Akbar

Reply via email to