Re: Repair on a slow node (or is it?)

Lapo Luchini Wed, 31 Mar 2021 07:12:11 -0700

Thanks for all your suggestions!

I'm looking into it and so far it seems to be mainly a problem of diskI/O, as the host is running on spindle disks and being a DR of an entirecluster gives it many changes to follow.


First (easy) try will be to add an SSD as ZFS cache (ZIL + L2ARC).
Should make a huge difference alrady.

I will then later on study Medusa/tablesnap too, thanks.

cheers,
Lapo

On 2021-03-29 12:32, Kane Wilson wrote:

Check what your compactionthroughput is set to, as it will impact thevalidation compactions. also what kind of disks does the DR node have?The validation compaction sizes are likely fine, I'm not sure of theexact details but it's normal to expect very large validations.
Rebuilding would not be an ideal mechanism for repairing, and wouldlikely be slower and chew up a lot of disk space. It's also notguaranteed to give you data that will be consistent with the other DC,as replicas will only be streamed from one node.
I think you're better off looking at setting up regular backups and ifyou really need it commitlog backups. The storage would be cheaper andmore reliable, plus less impactful on your production DC. Restoring willalso be a lot easier and faster as well, as restoring from a single nodeDC will be network bottlenecked. There are various tools around that dothis for you such as medusa or tablesnap.

Re: Repair on a slow node (or is it?)

Reply via email to