Thanks for all your suggestions!

I'm looking into it and so far it seems to be mainly a problem of disk I/O, as the host is running on spindle disks and being a DR of an entire cluster gives it many changes to follow.

First (easy) try will be to add an SSD as ZFS cache (ZIL + L2ARC).
Should make a huge difference alrady.

I will then later on study Medusa/tablesnap too, thanks.

cheers,
Lapo

On 2021-03-29 12:32, Kane Wilson wrote:
Check what your compactionthroughput is set to, as it will impact the validation compactions. also what kind of disks does the DR node have? The validation compaction sizes are likely fine, I'm not sure of the exact details but it's normal to expect very large validations.

Rebuilding would not be an ideal mechanism for repairing, and would likely be slower and chew up a lot of disk space. It's also not guaranteed to give you data that will be consistent with the other DC, as replicas will only be streamed from one node.

 I think you're better off looking at setting up regular backups and if you really need it commitlog backups. The storage would be cheaper and more reliable, plus less impactful on your production DC. Restoring will also be a lot easier and faster as well, as restoring from a single node DC will be network bottlenecked. There are various tools around that do this for you such as medusa or tablesnap.

Reply via email to