[ https://issues.apache.org/jira/browse/CASSANDRA-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121116#comment-17121116 ]
Gianluca Righetto edited comment on CASSANDRA-15792 at 6/1/20, 4:01 PM: ------------------------------------------------------------------------ [~jmckenzie] After investigating this for some time now, I determined this is mostly related to a low write timeout used in TestSpeculativeReadRepair. That may lead to a speculated write to a different node depending on how long the original node takes to apply the repair mutation, but the test assertion is expecting no speculated writes. In other words, this is mostly a problem with the test, not with C* runtime, which is doing the right thing. In order to fix this, I made it accept speculated writes in the original test, but I also replicated the test method in a different test class with a longer write timeout to reduce the likelihood of speculated writes. Of course, since this is all time based, the new test may still fail under a system with high CPU contention, but at least for now I can't easily reproduce the failure anymore (whereas it was failing consistently for me before). There are other tests that guarantee the speculated write will happen when needed, the tricky part is testing it won't happen within a time frame, since it depends on system performance. Here's the pull request in my cassandra-dtest fork: [https://github.com/grighetto/cassandra-dtest/pull/1] Regarding the fixver, I'm ok with moving this to beta, even though the fix is already available, it still needs to go through review, but since this is not a runtime problem, I wouldn't say this is a blocker for alpha. was (Author: gianluca): [~jmckenzie] After investigating this for some time now, I determined this is mostly related to a low write timeout used in TestSpeculativeReadRepair. That may lead to a speculated write to a different node depending on how long the original node takes to apply the repair mutation, but the test assertion is expecting no speculated writes. In other words, this is mostly a problem with the test, not with C* runtime, which is doing the right thing. In order to fix this, I made it accept speculated writes in the original test, but I also replicated the test method in a different test class with a longer write timeout to reduce the likelihood of speculated writes. Of course, since this is all time based, the new test may still fail under a system with high CPU contention, but at least for now I can't easily reproduce the failure anymore (whereas it was failing consistently for me before). Here's the pull request in my cassandra-dtest fork: [https://github.com/grighetto/cassandra-dtest/pull/1] Regarding the fixver, I'm ok with moving this to beta, even though the fix is already available, it still needs to go through review, but since this is not a runtime problem, I wouldn't say this is a blocker for alpha. > test_speculative_data_request - read_repair_test.TestSpeculativeReadRepair > -------------------------------------------------------------------------- > > Key: CASSANDRA-15792 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15792 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest > Reporter: Ekaterina Dimitrova > Assignee: Gianluca Righetto > Priority: Normal > Fix For: 4.0-alpha > > > Failing on the latest trunk here: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/127/workflows/dfba669d-4a5c-4553-b6a2-85647d0d8d2b/jobs/668/tests > Failing once in 30 times as per Jenkins: > https://jenkins-cm4.apache.org/job/Cassandra-trunk-dtest/69/testReport/dtest.read_repair_test/TestSpeculativeReadRepair/test_speculative_data_request/ -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org