Re: failure node rejoin

Ben Slater Mon, 17 Oct 2016 22:18:56 -0700

OK, that’s a bit more unexpected (to me at least) but I think the solution
of running a rebuild or repair still applies.


On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:

> Thanks Ben, Jeff
>
> Sorry that my explanation confused you.
>
> Only node1 is the seed node.
> Node2 whose C* data is deleted is NOT a seed.
>
> I restarted the failure node(node2) after restarting the seed node(node1).
> The restarting node2 succeeded without the exception.
> (I couldn't restart node2 before restarting node1 as expected.)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> The unstated "problem" here is that node1 is a seed, which implies
> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
> setup to start without bootstrapping).
>
> That means once the data dir is wiped, it's going to start again without a
> bootstrap, and make a single node cluster or join an existing cluster if
> the seed list is valid
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, sorry - I think understand what you are asking now.
>
> However, I’m still a little confused by your description. I think your
> scenario is:
> 1) Stop C* on all nodes in a cluster (Nodes A,B,C)
> 2) Delete all data from Node A
> 3) Restart Node A
> 4) Restart Node B,C
>
> Is this correct?
>
> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node A
> starts succesfully as there are no running nodes to tell it via gossip that
> it shouldn’t start up without the “replaces” flag.
>
> I think that right way to recover in this scenario is to run a nodetool
> rebuild on Node A after the other two nodes are running. You could
> theoretically also run a repair (which would be good practice after a weird
> failure scenario like this) but rebuild will probably be quicker given you
> know all the data needs to be re-streamed.
>
> Cheers
> Ben
>
> On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thank you Ben, Yabin
>
> I understood the rejoin was illegal.
> I expected this rejoin would fail with the exception.
> But I could add the failure node to the cluster without the
> exception after 2) and 3).
> I want to know why the rejoin succeeds. Should the exception happen?
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote:
>
> The exception you run into is expected behavior. This is because as Ben
> pointed out, when you delete everything (including system schemas), C*
> cluster thinks you're bootstrapping a new node. However,  node2's IP is
> still in gossip and this is why you see the exception.
>
> I'm not clear the reasoning why you need to delete C* data directory. That
> is a dangerous action, especially considering that you delete system
> schemas. If in any case the failure node is gone for a while, what you need
> to do is to is remove the node first before doing "rejoin".
>
> Cheers,
>
> Yabin
>
> On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> To cassandra, the node where you deleted the files looks like a brand new
> machine. It doesn’t automatically rebuild machines to prevent accidental
> replacement. You need to tell it to build the “new” machines as a
> replacement for the “old” machine with that IP by setting 
> -Dcassandra.replace_address_first_boot=<dead_node_ip>.
> See http://cassandra.apache.org/doc/latest/operating/topo_changes.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_operating_topo-5Fchanges.html&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=KGo0EnUT-Bop-0OnyQJRuFvNOf99S9tWEgziATmNfJ8&s=YazqmnV8TuuQXt9PDn0kFe6C08b7tQQXrqouXBCVVXE&e=>
> .
>
> Cheers
> Ben
>
> On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Hi all,
>
> A failure node can rejoin a cluster.
> On the node, all data in /var/lib/cassandra were deleted.
> Is it normal?
>
> I can reproduce it as below.
>
> cluster:
> - C* 2.2.7
> - a cluster has node1, 2, 3
> - node1 is a seed
> - replication_factor: 3
>
> how to:
> 1) stop C* process and delete all data in /var/lib/cassandra on node2
> ($sudo rm -rf /var/lib/cassandra/*)
> 2) stop C* process on node1 and node3
> 3) restart C* on node1
> 4) restart C* on node2
>
> nodetool status after 4):
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns (effective)  Host ID
>                         Rack
> DN  [node3 IP]  ?                 256          100.0%
>  325553c6-3e05-41f6-a1f7-47436743816f  rack1
> UN  [node2 IP]  7.76 MB      256          100.0%
>  05bdb1d4-c39b-48f1-8248-911d61935925  rack1
> UN  [node1 IP]  416.13 MB  256          100.0%
>  a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1
>
> If I restart C* on node 2 when C* on node1 and node3 are running (without
> 2), 3)), a runtime exception happens.
> RuntimeException: "A node with address [node2 IP] already exists,
> cancelling join..."
>
> I'm not sure this causes data lost. All data can be read properly just
> after this rejoin.
> But some rows are lost when I kill&restart C* for destructive tests after
> this rejoin.
>
> Thanks.
>
> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>
>
>
> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>
> ____________________________________________________________________
> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and
> may be legally privileged. If you are not the intended recipient, do not
> disclose, copy, distribute, or use this email or any attachments. If you
> have received this in error please let the sender know and then delete the
> email and all attachments.
>
>
> --
————————
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

Re: failure node rejoin

Reply via email to