Hello,

We are using Cassandra 2.1.2 in a multi dc cluster (30 servers on DC1 and
10 on DC2) with a key space replication factor of 1 on DC1 and 2 on DC2.

For some reason when we increase the volume of write requests on DC1 (using
ONE or LOCAL_ONE), the Cassandra java process on DC2 nodes goes down
randomly.

At the time DC2 nodes starts to go down, the load average on DC1 nodes are
around 3-5 and on DC2 around 7-10.. so not big deal.

*Taking a look at the Cassandra's system.log, we found some exceptions:*

ERROR [SharedPool-Worker-43] 2014-11-15 00:39:48,596
JVMStabilityInspector.java:94 - JVM state determined to be unstable.
Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [CompactionExecutor:8] 2014-11-15 00:39:48,596
CassandraDaemon.java:153 - Exception in thread
Thread[CompactionExecutor:8,1,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [Thrift-Selector_2] 2014-11-15 00:39:48,596 Message.java:238 - Got an
IOException during write!
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
~[na:1.8.0_25]
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
~[na:1.8.0_25]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
~[na:1.8.0_25]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_25]
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470)
~[na:1.8.0_25]
        at
org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:164)
~[libthrift-0.9.1.jar:0.9.1]
        at
com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104)
~[thrift-server-0.3.7.jar:na]
        at
com.thinkaurelius.thrift.util.mem.FastMemoryOutputTransport.streamTo(FastMemoryOutputTransport.java:112)
~[thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.Message.write(Message.java:222)
~[thrift-server-0.3.7.jar:na]
        at
com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleWrite(TDisruptorServer.java:598)
[thrift-server-0.3.7.jar:na]
        at
com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:569)
[thrift-server-0.3.7.jar:na]
        at
com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423)
[thrift-server-0.3.7.jar:na]
        at
com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.run(TDisruptorServer.java:383)
[thrift-server-0.3.7.jar:na]
ERROR [Thread-94] 2014-11-15 00:39:48,597 CassandraDaemon.java:153 -
Exception in thread Thread[Thread-94,5,main]
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
~[na:1.8.0_25]
        at
org.apache.cassandra.db.composites.AbstractCType.sliceBytes(AbstractCType.java:369)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:101)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:110)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150)
~[apache-cassandra-2.1.2.jar:2.1.2]
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82)
~[apache-cassandra-2.1.2.jar:2.1.2]


*Memory:*
- DC1 servers have 32 GB of RAM and the HEAP is configured to 8 GB.
- DC2 servers have 16 GB of RAM and the HEAP is also configured to 8 GB.

Please, any hint?

Thanks in advance.

Gabriel.

Reply via email to