Benoit Tellier created JAMES-3660:
-------------------------------------
Summary: Cassandra mailbox creation unstable when high concurency
Key: JAMES-3660
URL: https://issues.apache.org/jira/browse/JAMES-3660
Project: James Server
Issue Type: Improvement
Reporter: Benoit Tellier
org.apache.james.mailbox.cassandra.CassandraMailboxManagerTest$WithBatchSize.creatingConcurrentlyMailboxesWithSameParentShouldNotFail
tests is enough to trigger instability on the Apache CI
https://ci-builds.apache.org/job/james/job/ApacheJames/job/PR-685/1/
{code:java}
Error Message
java.lang.RuntimeException:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during read query at consistency SERIAL (1 responses were required but only 0
replica responded)
Stacktrace
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during read query at consistency SERIAL (1 responses were required but only 0
replica responded)
Caused by: java.lang.RuntimeException:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during read query at consistency SERIAL (1 responses were required but only 0
replica responded)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
timeout during read query at consistency SERIAL (1 responses were required but
only 0 replica responded)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
timeout during read query at consistency SERIAL (1 responses were required but
only 0 replica responded)
Standard Output
11:29:54.751 [ERROR] o.a.j.u.c.ConcurrentTestRunner - Error caught during
concurrent testing (iteration 0, threadNumber 1)
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during read query at consistency SERIAL (1 responses were required but only 0
replica responded)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:90)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:65)
at
com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297)
at
com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268)
at
com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
... 25 common frames omitted
Wrapped by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
timeout during read query at consistency SERIAL (1 responses were required but
only 0 replica responded)
{code}
In short, the LWT usage is enough to create contention.
Looking closer at the issue, StoreMailboxManager does numerous defensive SERIAL
reads (doing empty paxos commits) which ends up further degrading performance
and increase contention.
I believe removing these defensive reads would make our code more stable.
It resulted in faster (x2) test for
gConcurrentlyMailboxesWithSameParentShouldNotFail
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]