Re: failure node rejoin

Yuji Ito Tue, 08 Nov 2016 17:57:04 -0800

I tried C* 3.0.9 instead of 2.2.
The data lost problem hasn't happen for now (without `nodetool flush`).


Thanks

On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote:

> Thanks Ben,
>
> When I added `nodetool flush` on all nodes after step 2, the problem
> didn't happen.
> Did replay from old commit logs delete rows?
>
> Perhaps, the flush operation just detected that some nodes were down in
> step 2 (just after truncating tables).
> (Insertion and check in step2 would succeed if one node was down because
> consistency levels was serial.
> If the flush failed on more than one node, the test would retry step 2.)
> However, if so, the problem would happen without deleting Cassandra data.
>
> Regards,
> yuji
>
>
> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>> Definitely sounds to me like something is not working as expected but I
>> don’t really have any idea what would cause that (other than the fairly
>> extreme failure scenario). A couple of things I can think of to try to
>> narrow it down:
>> 1) Run nodetool flush on all nodes after step 2 - that will make sure all
>> data is written to sstables rather than relying on commit logs
>> 2) Run the test with consistency level quorom rather than serial
>> (shouldn’t be any different but quorom is more widely used so maybe there
>> is a bug that’s specific to serial)
>>
>> Cheers
>> Ben
>>
>> On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote:
>>
>>> Hi Ben,
>>>
>>> The test without killing nodes has been working well without data lost.
>>> I've repeated my test about 200 times after removing data and
>>> rebuild/repair.
>>>
>>> Regards,
>>>
>>>
>>> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>> > Just to confirm, are you saying:
>>> > a) after operation 2, you select all and get 1000 rows
>>> > b) after operation 3 (which only does updates and read) you select and
>>> only get 953 rows?
>>>
>>> That's right!
>>>
>>> I've started the test without killing nodes.
>>> I'll report the result to you next Monday.
>>>
>>> Thanks
>>>
>>>
>>> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com>
>>> wrote:
>>>
>>> Just to confirm, are you saying:
>>> a) after operation 2, you select all and get 1000 rows
>>> b) after operation 3 (which only does updates and read) you select and
>>> only get 953 rows?
>>>
>>> If so, that would be very unexpected. If you run your tests without
>>> killing nodes do you get the expected (1,000) rows?
>>>
>>> Cheers
>>> Ben
>>>
>>> On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>> > Are you certain your tests don’t generate any overlapping inserts (by
>>> PK)?
>>>
>>> Yes. The operation 2) also checks the number of rows just after all
>>> insertions.
>>>
>>>
>>> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com>
>>> wrote:
>>>
>>> OK. Are you certain your tests don’t generate any overlapping inserts
>>> (by PK)? Cassandra basically treats any inserts with the same primary key
>>> as updates (so 1000 insert operations may not necessarily result in 1000
>>> rows in the DB).
>>>
>>> On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>> thanks Ben,
>>>
>>> > 1) At what stage did you have (or expect to have) 1000 rows (and have
>>> the mismatch between actual and expected) - at that end of operation (2) or
>>> after operation (3)?
>>>
>>> after operation 3), at operation 4) which reads all rows by cqlsh with
>>> CL.SERIAL
>>>
>>> > 2) What replication factor and replication strategy is used by the
>>> test keyspace? What consistency level is used by your operations?
>>>
>>> - create keyspace testkeyspace WITH REPLICATION =
>>> {'class':'SimpleStrategy','replication_factor':3};
>>> - consistency level is SERIAL
>>>
>>>
>>> On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com
>>> > wrote:
>>>
>>>
>>> A couple of questions:
>>> 1) At what stage did you have (or expect to have) 1000 rows (and have
>>> the mismatch between actual and expected) - at that end of operation (2) or
>>> after operation (3)?
>>> 2) What replication factor and replication strategy is used by the test
>>> keyspace? What consistency level is used by your operations?
>>>
>>>
>>> Cheers
>>> Ben
>>>
>>> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>> Thanks Ben,
>>>
>>> I tried to run a rebuild and repair after the failure node rejoined the
>>> cluster as a "new" node with -Dcassandra.replace_address_first_boot.
>>> The failure node could rejoined and I could read all rows successfully.
>>> (Sometimes a repair failed because the node cannot access other node. If
>>> it failed, I retried a repair)
>>>
>>> But some rows were lost after my destructive test repeated (after about
>>> 5-6 hours).
>>> After the test inserted 1000 rows, there were only 953 rows at the end
>>> of the test.
>>>
>>> My destructive test:
>>> - each C* node is killed & restarted at the random interval (within
>>> about 5 min) throughout this test
>>> 1) truncate all tables
>>> 2) insert initial rows (check if all rows are inserted successfully)
>>> 3) request a lot of read/write to random rows for about 30min
>>> 4) check all rows
>>> If operation 1), 2) or 4) fail due to C* failure, the test retry the
>>> operation.
>>>
>>> Does anyone have the similar problem?
>>> What causes data lost?
>>> Does the test need any operation when C* node is restarted? (Currently,
>>> I just restarted C* process)
>>>
>>> Regards,
>>>
>>>
>>> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com>
>>> wrote:
>>>
>>> OK, that’s a bit more unexpected (to me at least) but I think the
>>> solution of running a rebuild or repair still applies.
>>>
>>> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>> Thanks Ben, Jeff
>>>
>>> Sorry that my explanation confused you.
>>>
>>> Only node1 is the seed node.
>>> Node2 whose C* data is deleted is NOT a seed.
>>>
>>> I restarted the failure node(node2) after restarting the seed
>>> node(node1).
>>> The restarting node2 succeeded without the exception.
>>> (I couldn't restart node2 before restarting node1 as expected.)
>>>
>>> Regards,
>>>
>>>
>>> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> wrote:
>>>
>>> The unstated "problem" here is that node1 is a seed, which implies
>>> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
>>> setup to start without bootstrapping).
>>>
>>> That means once the data dir is wiped, it's going to start again without
>>> a bootstrap, and make a single node cluster or join an existing cluster if
>>> the seed list is valid
>>>
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com>
>>> wrote:
>>>
>>> OK, sorry - I think understand what you are asking now.
>>>
>>> However, I’m still a little confused by your description. I think your
>>> scenario is:
>>> 1) Stop C* on all nodes in a cluster (Nodes A,B,C)
>>> 2) Delete all data from Node A
>>> 3) Restart Node A
>>> 4) Restart Node B,C
>>>
>>> Is this correct?
>>>
>>> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node
>>> A starts succesfully as there are no running nodes to tell it via gossip
>>> that it shouldn’t start up without the “replaces” flag.
>>>
>>> I think that right way to recover in this scenario is to run a nodetool
>>> rebuild on Node A after the other two nodes are running. You could
>>> theoretically also run a repair (which would be good practice after a weird
>>> failure scenario like this) but rebuild will probably be quicker given you
>>> know all the data needs to be re-streamed.
>>>
>>> ...
>
> [Message clipped]

Re: failure node rejoin

Reply via email to