Re: How to increase replication rate from a large single node database?

Nick Vatamaniuc Fri, 15 Mar 2024 09:35:31 -0700

Hi Christopher,

It would depend on what bottleneck is causing the slowness. It could
be limited by the speed at the source, the replication job itself, or
the target write throughput.  Some of the settings you can play with
are:
  * Increase batch size if the documents are not too large
https://docs.couchdb.org/en/stable/config/replicator.html#replicator/worker_batch_size
  * Increase the number of worker processes from 4 to something
larger, 10 or 20 perhaps
https://docs.couchdb.org/en/stable/config/replicator.html#replicator/worker_processes
  * If you increase workers size, also increase the http_connections a
bit as the workers might be limited to the http_connection anyway, so
can make that 40 or 100 or so.

If the bottleneck is the source I/O, it might not be much you can do.
In general, depending where the bottleneck is, adjusting some of those
settings might not have any effect.

A few more notes:

 * Keep an eye on the logs and replication job stats so see if there
are any timeouts or errors, basically to see if the replication job
crashes and restarts.
 * Q=256 does seem a bit high, That might affect any change feeds or
view queries. But if you have enough disk I/O throughput (parallelism)
it could work.
  * There are perhaps a few tweaks you can do to vm.args (erlang vm
args) to help with disk I/O. One is setting +SDio to something higher
than the default 16
https://github.com/apache/couchdb/blob/main/rel/overlay/etc/vm.args#L62.
We use +SDio 80 in production but I've seen others use +SDio 128 and
such.

Cheers,
-Nick

On Fri, Mar 15, 2024 at 3:39 AM Chris Bayliss
<[email protected]> wrote:
>
> Hi all,
>
> I inherited a single-node CouchDB database that backs a medical research 
> project. We’ve been using CouchDB for 10+ years so not a concern. Then I 
> spotted it uses a single database to store billions, 10^9 if we’re being 
> pedantic, of documents (2B at the time just over a TB of data) across the 
> default 2 shards. Not ideal but technically not a problem then I spotted it’s 
> ingesting ~30M documents a day and was continuously compressing and 
> reindexing everything associated with this database.
>
> Skipping over months of trial and error. I’m currently replicating it to a 4 
> node NVMe backed cluster n=3 q=256. Everything is running 3.3.3 (the Erlang 
> 24.3 version). I’ve read [1] and [2] and right now it’s replicating at 2.25k 
> documents a second +/- 0.5k . This is acceptable, it will catch up with the 
> initial node eventually,  but at the rate it’s going it’ll be ~60 days.
>
> How can speed this process up if at all?
>
> I’d add the code that accesses this database isn’t mine either so splitting 
> the database out into logical subsets isn’t an option at this time.
>
> Thanks
>
>     Chris
>
> 1 - 
> https://blog.cloudant.com/2023/02/08/Replication-efficiency-improvements.html
> 2 - https://github.com/apache/couchdb/issues/4308
>
>
> --
> Christopher Bayliss
> Senior Software Engineer, Melbourne eResearch Group
>
> School of Computing and Information Systems
> Level 5, Melbourne Connect (Building 290)
> University of Melbourne, VIC, 3010, Australia
>
> Email: 
> [email protected]<mailto:[email protected]>
>
>

Re: How to increase replication rate from a large single node database?

Reply via email to