Hi all - I'm getting the following error on RC1:
WARNÂ [Messaging-EventLoop-3-23] 2021-05-10 17:29:12,431
NoSpamLogger.java:95 -
/172.16.100.39:7000->/172.16.100.248:7000-URGENT_MESSAGES-e8d21588
dropping message of type FAILURE_RSP whose timeout expired before
reaching the network
ERROR [CounterMutationStage-62] 2021-05-10 17:29:12,431
AbstractLocalAwareExecutorService.java:166 - Uncaught exception on
thread Thread[CounterMutationStage-62,5,main]
java.lang.RuntimeException:
org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed
out - received only 0 responses.
       at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2278)
       at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
       at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
       at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
       at
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
       at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
       at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.cassandra.exceptions.WriteTimeoutException:
Operation timed out - received only 0 responses.
       at
org.apache.cassandra.db.CounterMutation.grabCounterLocks(CounterMutation.java:162)
       at
org.apache.cassandra.db.CounterMutation.applyCounterMutation(CounterMutation.java:131)
       at
org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:1678)
       at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2274)
       ... 6 common frames omitted
This happens under load.
I'm also seeing a lot of these messages:
WARNÂ [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:319
- Not marking nodes down due to local pause of 5785753812ns > 5000000000ns
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:325 -
Still not marking nodes down due to local pause
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:325 -
Still not marking nodes down due to local pause
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:325 -
Still not marking nodes down due to local pause
The other messages are slow queries like:
SELECT mediatype, origvalue FROM doc.origdoc WHERE uuid =
DS_5_2021-05-08T06-53-41.442Z_Hi0ywdNE LIMIT 1>, time 1370 msec - slow
timeout 500 msec
I've tried switching the G1 garbage collector (java 11), and that did
reduce these times (was seeing over 5000msec). The above select
statement is on a table where uuid is the primary key.
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address        Load      Tokens Owns
(effective)Â Host
IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Rack
UNÂ 172.16.100.208Â 9.16 GiBÂ Â 30Â Â Â Â Â
9.3%Â Â Â Â Â Â Â Â Â Â Â Â Â 2529b6ed-cdb2-43c2-bdd7-171cfe308bd3Â rack1
UNÂ 172.16.100.249Â 60.69 GiBÂ 200Â Â Â Â
62.9%Â Â Â Â Â Â Â Â Â Â Â Â 49e4f571-7d1c-4e1e-aca7-5bbe076596f7Â rack1
UNÂ 172.16.100.36Â Â 61.16 GiBÂ 200Â Â Â Â
62.9%Â Â Â Â Â Â Â Â Â Â Â Â d9702f96-256e-45ae-8e12-69a42712be50Â rack1
UNÂ 172.16.100.39Â Â 61.07 GiBÂ 200Â Â Â Â
63.0%Â Â Â Â Â Â Â Â Â Â Â Â 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47Â rack1
UNÂ 172.16.100.253Â 1.24 GiBÂ Â 4Â Â Â Â Â Â
1.3%             a1a16910-9167-4174-b34b-eb859d36347e rack1
UNÂ 172.16.100.248Â 60.35 GiBÂ 200Â Â Â Â
62.9%Â Â Â Â Â Â Â Â Â Â Â Â 4bbbe57c-6219-41e5-bbac-de92a9594d53Â rack1
UNÂ 172.16.100.37Â Â 37.18 GiBÂ 120Â Â Â Â
37.7%Â Â Â Â Â Â Â Â Â Â Â Â 08a19658-40be-4e55-8709-812b3d4ac750Â rack1
nodetool tablestats doc.origdoc
Total number of tables: 74
----------------
Keyspace : doc
       Read Count: 37511
       Read Latency: 33.929465116899046 ms
       Write Count: 4604965
       Write Latency: 0.20405303102195133 ms
       Pending Flushes: 0
               Table: origdoc
               SSTable count: 85
               Old SSTable count: 0
               Space used (live): 54635707180
               Space used (total): 54635707180
               Space used by snapshots (total): 0
               Off heap memory used (total): 258773554
               SSTable Compression Ratio:
0.33099344385825985
               Number of partitions (estimate): 114982637
               Memtable cell count: 0
               Memtable data size: 0
               Memtable off heap memory used: 0
               Memtable switch count: 0
               Local read count: 5749
               Local read latency: 240.422 ms
               Local write count: 0
               Local write latency: NaN ms
               Pending flushes: 0
               Percent repaired: 0.01
               Bloom filter false positives: 16
               Bloom filter false ratio: 0.00000
               Bloom filter space used: 141861208
               Bloom filter off heap memory used: 141860528
               Index summary off heap memory used: 44391250
               Compression metadata off heap memory
used: 72521776
               Compacted partition minimum bytes: 259
               Compacted partition maximum bytes: 4768
               Compacted partition mean bytes: 1366
               Average live cells per slice (last five
minutes): 1.0
               Maximum live cells per slice (last five
minutes): 1
               Average tombstones per slice (last five
minutes): 1.0
               Maximum tombstones per slice (last five
minutes): 1
               Dropped Mutations: 0
Things to check? Things to try?
Thanks!
-Joe