Thanks Ben, I tried 2.2.8 and could reproduce the problem. So, I'm investigating some bug fixes of repair and commitlog between 2.2.8 and 3.0.9.
- CASSANDRA-12508: "nodetool repair returns status code 0 for some errors" - CASSANDRA-12436: "Under some races commit log may incorrectly think it has unflushed data" - related to CASSANDRA-9669, CASSANDRA-11828 (the fix of 2.2 is different from that of 3.0?) Do you know other bug fixes related to commitlog? Regards yuji On Wed, Nov 9, 2016 at 11:34 AM, Ben Slater <ben.sla...@instaclustr.com> wrote: > There have been a few commit log bugs around in the last couple of months > so perhaps you’ve hit something that was fixed recently. Would be > interesting to know the problem is still occurring in 2.2.8. > > I suspect what is happening is that when you do your initial read (without > flush) to check the number of rows, the data is in memtables and > theoretically the commitlogs but not sstables. With the forced stop the > memtables are lost and Cassandra should read the commitlog from disk at > startup to reconstruct the memtables. However, it looks like that didn’t > happen for some (bad) reason. > > Good news that 3.0.9 fixes the problem so up to you if you want to > investigate further and see if you can narrow it down to file a JIRA > (although the first step of that would be trying 2.2.9 to make sure it’s > not already fixed there). > > Cheers > Ben > > On Wed, 9 Nov 2016 at 12:56 Yuji Ito <y...@imagine-orb.com> wrote: > >> I tried C* 3.0.9 instead of 2.2. >> The data lost problem hasn't happen for now (without `nodetool flush`). >> >> Thanks >> >> On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote: >> >> Thanks Ben, >> >> When I added `nodetool flush` on all nodes after step 2, the problem >> didn't happen. >> Did replay from old commit logs delete rows? >> >> Perhaps, the flush operation just detected that some nodes were down in >> step 2 (just after truncating tables). >> (Insertion and check in step2 would succeed if one node was down because >> consistency levels was serial. >> If the flush failed on more than one node, the test would retry step 2.) >> However, if so, the problem would happen without deleting Cassandra data. >> >> Regards, >> yuji >> >> >> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> Definitely sounds to me like something is not working as expected but I >> don’t really have any idea what would cause that (other than the fairly >> extreme failure scenario). A couple of things I can think of to try to >> narrow it down: >> 1) Run nodetool flush on all nodes after step 2 - that will make sure all >> data is written to sstables rather than relying on commit logs >> 2) Run the test with consistency level quorom rather than serial >> (shouldn’t be any different but quorom is more widely used so maybe there >> is a bug that’s specific to serial) >> >> Cheers >> Ben >> >> On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Hi Ben, >> >> The test without killing nodes has been working well without data lost. >> I've repeated my test about 200 times after removing data and >> rebuild/repair. >> >> Regards, >> >> >> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote: >> >> > Just to confirm, are you saying: >> > a) after operation 2, you select all and get 1000 rows >> > b) after operation 3 (which only does updates and read) you select and >> only get 953 rows? >> >> That's right! >> >> I've started the test without killing nodes. >> I'll report the result to you next Monday. >> >> Thanks >> >> >> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> Just to confirm, are you saying: >> a) after operation 2, you select all and get 1000 rows >> b) after operation 3 (which only does updates and read) you select and >> only get 953 rows? >> >> If so, that would be very unexpected. If you run your tests without >> killing nodes do you get the expected (1,000) rows? >> >> Cheers >> Ben >> >> On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote: >> >> > Are you certain your tests don’t generate any overlapping inserts (by >> PK)? >> >> Yes. The operation 2) also checks the number of rows just after all >> insertions. >> >> >> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> OK. Are you certain your tests don’t generate any overlapping inserts (by >> PK)? Cassandra basically treats any inserts with the same primary key as >> updates (so 1000 insert operations may not necessarily result in 1000 rows >> in the DB). >> >> On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote: >> >> thanks Ben, >> >> > 1) At what stage did you have (or expect to have) 1000 rows (and have >> the mismatch between actual and expected) - at that end of operation (2) or >> after operation (3)? >> >> after operation 3), at operation 4) which reads all rows by cqlsh with >> CL.SERIAL >> >> > 2) What replication factor and replication strategy is used by the test >> keyspace? What consistency level is used by your operations? >> >> - create keyspace testkeyspace WITH REPLICATION = >> {'class':'SimpleStrategy','replication_factor':3}; >> - consistency level is SERIAL >> >> >> On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> >> A couple of questions: >> 1) At what stage did you have (or expect to have) 1000 rows (and have the >> mismatch between actual and expected) - at that end of operation (2) or >> after operation (3)? >> 2) What replication factor and replication strategy is used by the test >> keyspace? What consistency level is used by your operations? >> >> >> Cheers >> Ben >> >> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Thanks Ben, >> >> I tried to run a rebuild and repair after the failure node rejoined the >> cluster as a "new" node with -Dcassandra.replace_address_first_boot. >> The failure node could rejoined and I could read all rows successfully. >> (Sometimes a repair failed because the node cannot access other node. If >> it failed, I retried a repair) >> >> But some rows were lost after my destructive test repeated (after about >> 5-6 hours). >> After the test inserted 1000 rows, there were only 953 rows at the end of >> the test. >> >> My destructive test: >> - each C* node is killed & restarted at the random interval (within about >> 5 min) throughout this test >> 1) truncate all tables >> 2) insert initial rows (check if all rows are inserted successfully) >> 3) request a lot of read/write to random rows for about 30min >> 4) check all rows >> If operation 1), 2) or 4) fail due to C* failure, the test retry the >> operation. >> >> Does anyone have the similar problem? >> What causes data lost? >> Does the test need any operation when C* node is restarted? (Currently, I >> just restarted C* process) >> >> Regards, >> >> >> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> OK, that’s a bit more unexpected (to me at least) but I think the >> solution of running a rebuild or repair still applies. >> >> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Thanks Ben, Jeff >> >> Sorry that my explanation confused you. >> >> Only node1 is the seed node. >> Node2 whose C* data is deleted is NOT a seed. >> >> I restarted the failure node(node2) after restarting the seed node(node1). >> The restarting node2 succeeded without the exception. >> (I couldn't restart node2 before restarting node1 as expected.) >> >> Regards, >> >> >> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >> ... > > [Message clipped]