OK, that’s a bit more unexpected (to me at least) but I think the solution of running a rebuild or repair still applies.
On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote: > Thanks Ben, Jeff > > Sorry that my explanation confused you. > > Only node1 is the seed node. > Node2 whose C* data is deleted is NOT a seed. > > I restarted the failure node(node2) after restarting the seed node(node1). > The restarting node2 succeeded without the exception. > (I couldn't restart node2 before restarting node1 as expected.) > > Regards, > > > On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > > The unstated "problem" here is that node1 is a seed, which implies > auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly > setup to start without bootstrapping). > > That means once the data dir is wiped, it's going to start again without a > bootstrap, and make a single node cluster or join an existing cluster if > the seed list is valid > > > > -- > Jeff Jirsa > > > On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > OK, sorry - I think understand what you are asking now. > > However, I’m still a little confused by your description. I think your > scenario is: > 1) Stop C* on all nodes in a cluster (Nodes A,B,C) > 2) Delete all data from Node A > 3) Restart Node A > 4) Restart Node B,C > > Is this correct? > > If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node A > starts succesfully as there are no running nodes to tell it via gossip that > it shouldn’t start up without the “replaces” flag. > > I think that right way to recover in this scenario is to run a nodetool > rebuild on Node A after the other two nodes are running. You could > theoretically also run a repair (which would be good practice after a weird > failure scenario like this) but rebuild will probably be quicker given you > know all the data needs to be re-streamed. > > Cheers > Ben > > On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote: > > Thank you Ben, Yabin > > I understood the rejoin was illegal. > I expected this rejoin would fail with the exception. > But I could add the failure node to the cluster without the > exception after 2) and 3). > I want to know why the rejoin succeeds. Should the exception happen? > > Regards, > > > On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote: > > The exception you run into is expected behavior. This is because as Ben > pointed out, when you delete everything (including system schemas), C* > cluster thinks you're bootstrapping a new node. However, node2's IP is > still in gossip and this is why you see the exception. > > I'm not clear the reasoning why you need to delete C* data directory. That > is a dangerous action, especially considering that you delete system > schemas. If in any case the failure node is gone for a while, what you need > to do is to is remove the node first before doing "rejoin". > > Cheers, > > Yabin > > On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > To cassandra, the node where you deleted the files looks like a brand new > machine. It doesn’t automatically rebuild machines to prevent accidental > replacement. You need to tell it to build the “new” machines as a > replacement for the “old” machine with that IP by setting > -Dcassandra.replace_address_first_boot=<dead_node_ip>. > See http://cassandra.apache.org/doc/latest/operating/topo_changes.html > <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_operating_topo-5Fchanges.html&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=KGo0EnUT-Bop-0OnyQJRuFvNOf99S9tWEgziATmNfJ8&s=YazqmnV8TuuQXt9PDn0kFe6C08b7tQQXrqouXBCVVXE&e=> > . > > Cheers > Ben > > On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote: > > Hi all, > > A failure node can rejoin a cluster. > On the node, all data in /var/lib/cassandra were deleted. > Is it normal? > > I can reproduce it as below. > > cluster: > - C* 2.2.7 > - a cluster has node1, 2, 3 > - node1 is a seed > - replication_factor: 3 > > how to: > 1) stop C* process and delete all data in /var/lib/cassandra on node2 > ($sudo rm -rf /var/lib/cassandra/*) > 2) stop C* process on node1 and node3 > 3) restart C* on node1 > 4) restart C* on node2 > > nodetool status after 4): > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > DN [node3 IP] ? 256 100.0% > 325553c6-3e05-41f6-a1f7-47436743816f rack1 > UN [node2 IP] 7.76 MB 256 100.0% > 05bdb1d4-c39b-48f1-8248-911d61935925 rack1 > UN [node1 IP] 416.13 MB 256 100.0% > a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b rack1 > > If I restart C* on node 2 when C* on node1 and node3 are running (without > 2), 3)), a runtime exception happens. > RuntimeException: "A node with address [node2 IP] already exists, > cancelling join..." > > I'm not sure this causes data lost. All data can be read properly just > after this rejoin. > But some rows are lost when I kill&restart C* for destructive tests after > this rejoin. > > Thanks. > > -- > ———————— > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 > > > > -- > ———————— > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 > > ____________________________________________________________________ > CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and > may be legally privileged. If you are not the intended recipient, do not > disclose, copy, distribute, or use this email or any attachments. If you > have received this in error please let the sender know and then delete the > email and all attachments. > > > -- ———————— Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798