Hello Also i could delete system_traces which is empty anyway, but there is a > system_auth and system_distributed keyspace too and they are not empty, > Could i delete them safely too?
I would say no, not safely, as I am not sure about some of them, but maybe this would work. Here is what I know: 'system_traces' can be deleted. 'system_auth' should be ok to delete if you do not use auth (but I am not 100% sure) but system_distributed looks like an important keyspace, I am really uncertain about this one. It was added in 'recent' Cassandra version and I am not too sure what's in there, to be honest. It seems quite a complex situation. A possible way out would be to do the 3 nodes replacement without streaming and then repair (making sure no client talk to or get information from these nodes meanwhile), but I think 'replace' and 'auto_bootstrap: false' do not work well together. Another idea is that 'nodetool removenode' on the 3 nodes, 1 by 1, should also ensure consistency is preserved (if not forced). Then you could recreate the 3 nodes of this rack. I am afraid this operation might fail as well though, for the same reason of having ranges unavailable as well. I am still thinking about it, but before going deeper, is this still an issue for you at the moment? C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le lun. 10 sept. 2018 à 13:43, onmstester onmstester <onmstes...@zoho.com> a écrit : > Thanks Alain, > First here it is more detail about my cluster: > > - 10 racks + 3 nodes on each rack > - nodetool status: shows 27 nodes UN and 3 nodes all related to single > rack as DN > - version 3.11.2 > > *Option 1: (Change schema and) use replace method (preferred method)* > * Did you try to have the replace going, without any former repairs, > ignoring the fact 'system_traces' might be inconsistent? You probably don't > care about this table, so if Cassandra allows it with some of the nodes > down, going this way is relatively safe probably. I really do not see what > you could lose that matters in this table. > * Another option, if the schema first change was accepted, is to make the > second one, to drop this table. You can always rebuild it in case you need > it I assume. > > I really love to let the replace going, but it stops with the error: > > java.lang.IllegalStateException: unable to find sufficient sources for > streaming range in keyspace system_traces > > > Also i could delete system_traces which is empty anyway, but there is a > system_auth and system_distributed keyspace too and they are not empty, > Could i delete them safely too? > If i could just somehow skip streaming the system keyspaces from node > replace phase, the option 1 would be great. > > P.S: Its clear to me that i should use at least RF=3 in production, but > could not manage to acquire enough resources yet (i hope would be fixed in > recent future) > > Again Thank you for your time > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > ---- On Mon, 10 Sep 2018 16:20:10 +0430 *Alain RODRIGUEZ > <arodr...@gmail.com <arodr...@gmail.com>>* wrote ---- > > Hello, > > I am sorry it took us (the community) more than a day to answer to this > rather critical situation. That being said, my recommendation at this point > would be for you to make sure about the impacts of whatever you would try. > Working on a broken cluster, as an emergency might lead you to a second > mistake, possibly more destructive than the first one. It happened to me > and around, for many clusters. Move forward even more carefuly in these > situations as a global advice. > > Suddenly i lost all disks of cassandar-data on one of my racks > > > With RF=2, I guess operations use LOCAL_ONE consistency, thus you should > have all the data in the safe rack(s) with your configuration, you probably > did not lose anything yet and have the service only using the nodes up, > that got the right data. > > tried to replace the nodes with same ip using this: > > https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html > > > As a side note, I would recommend you to use 'replace_address_first_boot' > instead of 'replace_address'. This does basically the same but will be > ignored after the first bootstrap. A detail, but hey, it's there and > somewhat safer, I would use this one. > > java.lang.IllegalStateException: unable to find sufficient sources for > streaming range in keyspace system_traces > > > By default, non-user keyspace use 'SimpleStrategy' and a small RF. > Ideally, this should be changed in a production cluster, and you're having > an example of why. > > Now when i altered the system_traces keyspace startegy to > NetworkTopologyStrategy and RF=2 > but then running nodetool repair failed: Endpoint not alive /IP of dead > node that i'm trying to replace. > > > Changing the replication strategy you made the dead rack owner of part of > the token ranges, thus repairs just can't work as there will always be one > of the nodes involved down as the whole rack is down. Repair won't work, > but you probably do not need it! 'system_traces' is a temporary / debug > table. It's probably empty or with irrelevant data. > > Here are some thoughts: > > * It would be awesome at this point for us (and for you if you did not) to > see the status of the cluster: > ** 'nodetool status' > ** *'nodetool describecluster' *--> This one will tell if the nodes agree > on the schema (nodes up). I have seen schema changes with nodes down > inducing some issues. > *** *Cassandra version > ** Number of racks (I assumer #racks >= 2 in this email) > > *Option 1: (Change schema and) use replace method (preferred method)* > * Did you try to have the replace going, without any former repairs, > ignoring the fact 'system_traces' might be inconsistent? You probably don't > care about this table, so if Cassandra allows it with some of the nodes > down, going this way is relatively safe probably. I really do not see what > you could lose that matters in this table. > * Another option, if the schema first change was accepted, is to make the > second one, to drop this table. You can always rebuild it in case you need > it I assume. > > > *Option 2: Remove all the dead nodes *(try to avoid this option 2, if > option 1 works, it is better). > > Please do not take an apply this like this. It's a thought on how you > could get rid of the issue, yet it's rather brutal and risky and I did not > consider it deeply and have no clue about your architecture and the > context. Consider it carefully on your side. > > * You can also 'nodetool removenode' on each of the dead nodes. This will > have nodes streaming around and the rack isolation guarantee will no longer > be valid. It's hard to reason about what would happen to the data and in > terms of streaming. > * Alternatively, if you don't have enough space, you can even '*force*' > the 'nodetool removenode'. See the documentation. Forcing it will prevent > streaming and remove the node (token ranges handover, but not the data). If > that does not work you can use the 'nodetool assassinate' command as well. > > When adding nodes back to the broken DC, the first nodes will take > probably 100% of the ownership, which is often too much. You can consider > adding back all the nodes with 'auto_bootstrap: false' before repairing > them once they have their final token ownership, the same ways we do when > building a new data center. > > This option is not really clean, and have some caveats that* you need to > consider before starting* as there are token range movements and nodes > available that do not have the data. Yet this should work. I imagine it > would work nicely with RF=3 and QUORUM and with RF=2 (if you have 2+ > racks), I guess it should work as well but you will have to pick one of > availability or consistency while repairing the data. > > *Be aware that read requests hitting these nodes will not find data!* Plus, > you are using an *RF=2*. Thus using consistency of 2+ (TWO, QUORUM, ALL), > for at least one of reads or writes is needed to preserve consistency while > re-adding the nodes in this case. Otherwise, reads will not detect the > mismatch with certainty and might show inconsistent data the time for the > nodes to be repaired. > > I must say, that I really prefer odd values for the RF, starting with > RF=3. Using RF=2 you will have to pick. Consistency or Availability. With a > consistency of ONE everywhere, the service is available, no single point of > failure. using anything bigger than this, for writes or read, brings > consistency but it creates single points of failures (actually any node > becomes a point of failure). RF=3 and QUORUM for both write and reads > take the best of the 2 worlds somehow. The tradeoff with RF=3 and quorum > reads is the latency increase and the resource usage. > > Maybe is there a better approach, I am not too sure, but I think I would > try option 1 first in any case. It's less destructive, less risky, no token > range movements, no empty nodes available. I am not sure about limitation > you might face though and that's why I suggest a second option for you to > consider if the first is not actionable. > > Let us know how it goes, > C*heers, > ----------------------- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le lun. 10 sept. 2018 à 09:09, onmstester onmstester <onmstes...@zoho.com> > a écrit : > > > Any idea? > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > ---- On Sun, 09 Sep 2018 11:23:17 +0430 *onmstester onmstester > <onmstes...@zoho.com <onmstes...@zoho.com>>* wrote ---- > > > Hi, > > Cluster Spec: > 30 nodes > RF = 2 > NetworkTopologyStrategy > GossipingPropertyFileSnitch + rack aware > > Suddenly i lost all disks of cassandar-data on one of my racks, after > replacing the disks, tried to replace the nodes with same ip using this: > > https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html > > starting the to-be-replace-node fails with: > java.lang.IllegalStateException: unable to find sufficient sources for > streaming range in keyspace system_traces > > the problem is that i did not changed default replication config for > System keyspaces, but Now when i altered the system_traces keyspace > startegy to NetworkTopologyStrategy and RF=2 > but then running nodetool repair failed: Endpoint not alive /IP of dead > node that i'm trying to replace. > > What should i do now? > Can i just remove previous nodes, change dead nodes IPs and re-join them > to cluster? > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > > >