Hi,
>You'll need to disable the native transportWell, this is what I did already,
>it seems repair is running
I'm not sure whether repair will finish within 3 hours, but I can run it again
(as it's incremental repair by default, right?)
I'm not sure about RF=3 and QUORUM reads because of load/disk space constrains
we have, but we'll definitely consider this.
Thanks to all for help!
On Wednesday, August 29, 2018 4:13 PM, Alexander Dejanovski
<[email protected]> wrote:
Kurt is right.
So here are the options I can think of : - use the join_ring false technique
and rely on hints. You'll need to disable the native transport on the node as
well to prevent direct connections to be made to it. Hopefully, you can run
repair in less than 3 hours which is the hint window (hints will be collected
while the node hasn't joined the ring). Otherwise you'll have more consistency
issues after the node joins the ring again. Maybe incremental repair could help
fixing this quickly afterwards if you've been running full repairs that
involved anticompaction (if you're running at least Cassandra 2.2).- Fully
re-bootstrap the node by replacing itself, using the replace_address_first_boot
technique (but since you have RF=2, that would most probably mean some data
loss since you read/write at ONE)- Try to cheat the dynamic snitch to take the
node out of reads. You would then have the node join the ring normally, disable
native transport and raise Severity (in
org.apache.cassandra.db:type=DynamicEndpointSnitch) to something like 50 so the
node won't be selected by the dynamic snitch. I guess the value will reset
itself over time so you may need to set it to 50 on a regular basis while
repair is happening.
I would then strongly consider moving to RF=3 because RF=2 will lead you to
this type of situation again in the future and does not allow quorum reads with
fault tolerance. Good luck,
On Wed, Aug 29, 2018 at 1:56 PM Vlad <[email protected]> wrote:
I restarted with cassandra.join_ring=falsenodetool status on other nodes shows
this node as DN, while it see itself as UN.
>I'd say best to just query at QUORUM until you can finish repairs.We have RH
>2, so I guess QUORUM queries will fail. Also different application should be
>changed for this.
On Wednesday, August 29, 2018 2:41 PM, kurt greaves <[email protected]>
wrote:
Note that you'll miss incoming writes if you do that, so you'll be
inconsistent even after the repair. I'd say best to just query at QUORUM until
you can finish repairs.
On 29 August 2018 at 21:22, Alexander Dejanovski <[email protected]> wrote:
Hi Vlad, you must restart the node but first disable joining the cluster, as
described in the second part of this blog post :
http://thelastpickle.com/blog/ 2018/08/02/Re-Bootstrapping-
Without-Bootstrapping.html
Once repaired, you'll have to run "nodetool join" to start serving reads.
Le mer. 29 août 2018 à 12:40, Vlad <[email protected]> a écrit :
Will it help to set read_repair_chance to 1 (compaction is
SizeTieredCompactionStrategy)?
On Wednesday, August 29, 2018 1:34 PM, Vlad <[email protected]>
wrote:
Hi,
quite urgent questions:due to disk and C* start problem we were forced to
delete commit logs from one of nodes.
Now repair is running, but meanwhile some reads bring no data (RF=2)
Can this node be excluded from reads queries? And that all reads will be
redirected to other node in the ring?
Thanks to All for help.
--
-----------------Alexander DejanovskiFrance@alexanderdeja
ConsultantApache Cassandra Consultinghttp://www.thelastpickle.com
--
-----------------Alexander DejanovskiFrance@alexanderdeja
ConsultantApache Cassandra Consultinghttp://www.thelastpickle.com