I tried C* 3.0.9 instead of 2.2. The data lost problem hasn't happen for now (without `nodetool flush`).
Thanks On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote: > Thanks Ben, > > When I added `nodetool flush` on all nodes after step 2, the problem > didn't happen. > Did replay from old commit logs delete rows? > > Perhaps, the flush operation just detected that some nodes were down in > step 2 (just after truncating tables). > (Insertion and check in step2 would succeed if one node was down because > consistency levels was serial. > If the flush failed on more than one node, the test would retry step 2.) > However, if so, the problem would happen without deleting Cassandra data. > > Regards, > yuji > > > On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > >> Definitely sounds to me like something is not working as expected but I >> don’t really have any idea what would cause that (other than the fairly >> extreme failure scenario). A couple of things I can think of to try to >> narrow it down: >> 1) Run nodetool flush on all nodes after step 2 - that will make sure all >> data is written to sstables rather than relying on commit logs >> 2) Run the test with consistency level quorom rather than serial >> (shouldn’t be any different but quorom is more widely used so maybe there >> is a bug that’s specific to serial) >> >> Cheers >> Ben >> >> On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote: >> >>> Hi Ben, >>> >>> The test without killing nodes has been working well without data lost. >>> I've repeated my test about 200 times after removing data and >>> rebuild/repair. >>> >>> Regards, >>> >>> >>> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote: >>> >>> > Just to confirm, are you saying: >>> > a) after operation 2, you select all and get 1000 rows >>> > b) after operation 3 (which only does updates and read) you select and >>> only get 953 rows? >>> >>> That's right! >>> >>> I've started the test without killing nodes. >>> I'll report the result to you next Monday. >>> >>> Thanks >>> >>> >>> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com> >>> wrote: >>> >>> Just to confirm, are you saying: >>> a) after operation 2, you select all and get 1000 rows >>> b) after operation 3 (which only does updates and read) you select and >>> only get 953 rows? >>> >>> If so, that would be very unexpected. If you run your tests without >>> killing nodes do you get the expected (1,000) rows? >>> >>> Cheers >>> Ben >>> >>> On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote: >>> >>> > Are you certain your tests don’t generate any overlapping inserts (by >>> PK)? >>> >>> Yes. The operation 2) also checks the number of rows just after all >>> insertions. >>> >>> >>> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com> >>> wrote: >>> >>> OK. Are you certain your tests don’t generate any overlapping inserts >>> (by PK)? Cassandra basically treats any inserts with the same primary key >>> as updates (so 1000 insert operations may not necessarily result in 1000 >>> rows in the DB). >>> >>> On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote: >>> >>> thanks Ben, >>> >>> > 1) At what stage did you have (or expect to have) 1000 rows (and have >>> the mismatch between actual and expected) - at that end of operation (2) or >>> after operation (3)? >>> >>> after operation 3), at operation 4) which reads all rows by cqlsh with >>> CL.SERIAL >>> >>> > 2) What replication factor and replication strategy is used by the >>> test keyspace? What consistency level is used by your operations? >>> >>> - create keyspace testkeyspace WITH REPLICATION = >>> {'class':'SimpleStrategy','replication_factor':3}; >>> - consistency level is SERIAL >>> >>> >>> On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com >>> > wrote: >>> >>> >>> A couple of questions: >>> 1) At what stage did you have (or expect to have) 1000 rows (and have >>> the mismatch between actual and expected) - at that end of operation (2) or >>> after operation (3)? >>> 2) What replication factor and replication strategy is used by the test >>> keyspace? What consistency level is used by your operations? >>> >>> >>> Cheers >>> Ben >>> >>> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote: >>> >>> Thanks Ben, >>> >>> I tried to run a rebuild and repair after the failure node rejoined the >>> cluster as a "new" node with -Dcassandra.replace_address_first_boot. >>> The failure node could rejoined and I could read all rows successfully. >>> (Sometimes a repair failed because the node cannot access other node. If >>> it failed, I retried a repair) >>> >>> But some rows were lost after my destructive test repeated (after about >>> 5-6 hours). >>> After the test inserted 1000 rows, there were only 953 rows at the end >>> of the test. >>> >>> My destructive test: >>> - each C* node is killed & restarted at the random interval (within >>> about 5 min) throughout this test >>> 1) truncate all tables >>> 2) insert initial rows (check if all rows are inserted successfully) >>> 3) request a lot of read/write to random rows for about 30min >>> 4) check all rows >>> If operation 1), 2) or 4) fail due to C* failure, the test retry the >>> operation. >>> >>> Does anyone have the similar problem? >>> What causes data lost? >>> Does the test need any operation when C* node is restarted? (Currently, >>> I just restarted C* process) >>> >>> Regards, >>> >>> >>> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com> >>> wrote: >>> >>> OK, that’s a bit more unexpected (to me at least) but I think the >>> solution of running a rebuild or repair still applies. >>> >>> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote: >>> >>> Thanks Ben, Jeff >>> >>> Sorry that my explanation confused you. >>> >>> Only node1 is the seed node. >>> Node2 whose C* data is deleted is NOT a seed. >>> >>> I restarted the failure node(node2) after restarting the seed >>> node(node1). >>> The restarting node2 succeeded without the exception. >>> (I couldn't restart node2 before restarting node1 as expected.) >>> >>> Regards, >>> >>> >>> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >>> wrote: >>> >>> The unstated "problem" here is that node1 is a seed, which implies >>> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly >>> setup to start without bootstrapping). >>> >>> That means once the data dir is wiped, it's going to start again without >>> a bootstrap, and make a single node cluster or join an existing cluster if >>> the seed list is valid >>> >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com> >>> wrote: >>> >>> OK, sorry - I think understand what you are asking now. >>> >>> However, I’m still a little confused by your description. I think your >>> scenario is: >>> 1) Stop C* on all nodes in a cluster (Nodes A,B,C) >>> 2) Delete all data from Node A >>> 3) Restart Node A >>> 4) Restart Node B,C >>> >>> Is this correct? >>> >>> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node >>> A starts succesfully as there are no running nodes to tell it via gossip >>> that it shouldn’t start up without the “replaces” flag. >>> >>> I think that right way to recover in this scenario is to run a nodetool >>> rebuild on Node A after the other two nodes are running. You could >>> theoretically also run a repair (which would be good practice after a weird >>> failure scenario like this) but rebuild will probably be quicker given you >>> know all the data needs to be re-streamed. >>> >>> ... > > [Message clipped]