Re: failure node rejoin

Ben Slater Tue, 08 Nov 2016 18:36:06 -0800

There have been a few commit log bugs around in the last couple of months
so perhaps you’ve hit something that was fixed recently. Would be
interesting to know the problem is still occurring in 2.2.8.


I suspect what is happening is that when you do your initial read (without
flush) to check the number of rows, the data is in memtables and
theoretically the commitlogs but not sstables. With the forced stop the
memtables are lost and Cassandra should read the commitlog from disk at
startup to reconstruct the memtables. However, it looks like that didn’t
happen for some (bad) reason.

Good news that 3.0.9 fixes the problem so up to you if you want to
investigate further and see if you can narrow it down to file a JIRA
(although the first step of that would be trying 2.2.9 to make sure it’s
not already fixed there).

Cheers
Ben

On Wed, 9 Nov 2016 at 12:56 Yuji Ito <y...@imagine-orb.com> wrote:

> I tried C* 3.0.9 instead of 2.2.
> The data lost problem hasn't happen for now (without `nodetool flush`).
>
> Thanks
>
> On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> When I added `nodetool flush` on all nodes after step 2, the problem
> didn't happen.
> Did replay from old commit logs delete rows?
>
> Perhaps, the flush operation just detected that some nodes were down in
> step 2 (just after truncating tables).
> (Insertion and check in step2 would succeed if one node was down because
> consistency levels was serial.
> If the flush failed on more than one node, the test would retry step 2.)
> However, if so, the problem would happen without deleting Cassandra data.
>
> Regards,
> yuji
>
>
> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Definitely sounds to me like something is not working as expected but I
> don’t really have any idea what would cause that (other than the fairly
> extreme failure scenario). A couple of things I can think of to try to
> narrow it down:
> 1) Run nodetool flush on all nodes after step 2 - that will make sure all
> data is written to sstables rather than relying on commit logs
> 2) Run the test with consistency level quorom rather than serial
> (shouldn’t be any different but quorom is more widely used so maybe there
> is a bug that’s specific to serial)
>
> Cheers
> Ben
>
> On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Hi Ben,
>
> The test without killing nodes has been working well without data lost.
> I've repeated my test about 200 times after removing data and
> rebuild/repair.
>
> Regards,
>
>
> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Just to confirm, are you saying:
> > a) after operation 2, you select all and get 1000 rows
> > b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> That's right!
>
> I've started the test without killing nodes.
> I'll report the result to you next Monday.
>
> Thanks
>
>
> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Just to confirm, are you saying:
> a) after operation 2, you select all and get 1000 rows
> b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> If so, that would be very unexpected. If you run your tests without
> killing nodes do you get the expected (1,000) rows?
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Are you certain your tests don’t generate any overlapping inserts (by
> PK)?
>
> Yes. The operation 2) also checks the number of rows just after all
> insertions.
>
>
> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK. Are you certain your tests don’t generate any overlapping inserts (by
> PK)? Cassandra basically treats any inserts with the same primary key as
> updates (so 1000 insert operations may not necessarily result in 1000 rows
> in the DB).
>
> On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote:
>
> thanks Ben,
>
> > 1) At what stage did you have (or expect to have) 1000 rows (and have
> the mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
>
> after operation 3), at operation 4) which reads all rows by cqlsh with
> CL.SERIAL
>
> > 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
> - create keyspace testkeyspace WITH REPLICATION =
> {'class':'SimpleStrategy','replication_factor':3};
> - consistency level is SERIAL
>
>
> On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>
> A couple of questions:
> 1) At what stage did you have (or expect to have) 1000 rows (and have the
> mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
> 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> I tried to run a rebuild and repair after the failure node rejoined the
> cluster as a "new" node with -Dcassandra.replace_address_first_boot.
> The failure node could rejoined and I could read all rows successfully.
> (Sometimes a repair failed because the node cannot access other node. If
> it failed, I retried a repair)
>
> But some rows were lost after my destructive test repeated (after about
> 5-6 hours).
> After the test inserted 1000 rows, there were only 953 rows at the end of
> the test.
>
> My destructive test:
> - each C* node is killed & restarted at the random interval (within about
> 5 min) throughout this test
> 1) truncate all tables
> 2) insert initial rows (check if all rows are inserted successfully)
> 3) request a lot of read/write to random rows for about 30min
> 4) check all rows
> If operation 1), 2) or 4) fail due to C* failure, the test retry the
> operation.
>
> Does anyone have the similar problem?
> What causes data lost?
> Does the test need any operation when C* node is restarted? (Currently, I
> just restarted C* process)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, that’s a bit more unexpected (to me at least) but I think the solution
> of running a rebuild or repair still applies.
>
> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben, Jeff
>
> Sorry that my explanation confused you.
>
> Only node1 is the seed node.
> Node2 whose C* data is deleted is NOT a seed.
>
> I restarted the failure node(node2) after restarting the seed node(node1).
> The restarting node2 succeeded without the exception.
> (I couldn't restart node2 before restarting node1 as expected.)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> The unstated "problem" here is that node1 is a seed, which implies
> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
> setup to start without bootstrapping).
>
> That means once the data dir is wiped, it's going to start again without a
> bootstrap, and make a single node cluster or join an existing cluster if
> the seed list is valid
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 17, 2016, at 8:51 PM, Ben Slater < <ben.sla...@instaclustr.com>
>
>

Re: failure node rejoin

Reply via email to