My development team has been trying to track down the cause of this Read timeout (30 seconds or more at times) exception below. We’re running a 2 data center deployment with 3 nodes in each data center. Our tables are setup with replication factor = 2 and we have 16G dedicated to the heap with the G1GC for garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs and 32GB of RAM and we have 2 general purpose EBS volumes on each node of 500GB each. Once we start getting these timeouts the cluster doesn’t recover and we are required to shut all Cassandra node down and restart. If anyone has any tips on where to look or what commands to run to help us diagnose this issue we’d be eternally grateful.
2017-01-02 04:33:35.161 [ERROR] [report-compute.ffbec924-ce44-11e6-9e21-0adb9d2dd624] [reportCompute] [ahlworkerslave2.bos.manhattan.aspect-cloud.net:31312] [WorktypeMetrics] Persistence failure when replaying events for persistenceId [/fsms/pens/worktypes/bmwbpy.314]. Last known sequence number [0] java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at akka.persistence.cassandra.package$$anon$1$$anonfun$run$1.apply(package.scala:17) at scala.util.Try$.apply(Try.scala:192) Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:115) at com.datastax.driver.core.Responses$Error.asException(Responses.java:124) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:477) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928) Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:62) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:266) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:246) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) RICHARD NEY TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT +1 (978) 848.6640 WORK +1 (916) 846.2353 MOBILE UNITED STATES richard....@aspect.com<mailto:richard....@aspect.com> aspect.com<http://www.aspect.com/> [mailSigLogo-rev.jpg] This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.