My configuration is different (a lot of small DBs) but I had disk I/O
performance issues too when upgrading from CouchDB 2 to CouchDB 3.
Maybe it's related, maybe it's not.
I use AWS, the solution for me was to increase AWS disk IOPs.

See the full discussion here:
https://github.com/apache/couchdb/discussions/3217


Le lun. 4 avr. 2022 à 18:22, Roberto Iglesias <[email protected]> a
écrit :

> Hello.
>
> About 1 year ago, we had two CouchDB 2.3.1 instances running inside Docker
> containers and pull-replicating one each other. This way, we could read
> from and write to any of these servers, although we generally choose one as
> the "active" server and write to it. The second server would act as a spare
> or backup.
>
> At this point (1y ago) we decided to migrate from CouchDB version 2.3.1 to
> 3.1.1. Instead of upgrading our existing databases, we added two extra
> instances and configure pull replications in all of them until we get the
> following scenario:
>
> 2.3.1-A <===> 2.3.1-B <===> 3.1.1-A <===> 3.1.1-B
>
> where <===> represents two pull replications, one configured on each side.
> i.e: 2.3.1-A pulls from 2.3.1-B and vice versa.
>
> If a write is made at 2.3.1-A, it has to make it through all servers until
> it reaches 3.1.1-B.
>
> All of them have an exclusive HDD which is not shared with any other
> service.
>
> We have not a single problem with 2.3.1.
>
> After pointing our services to 3.1.1-*A*, it gradually started to
> increase Read I/O wait times over weeks until it reached peaks of 600ms
> (totally unworkable). So we stopped making write requests (http POST) to it
> and pointed all applications to 3.1.1-*B*. 3.1.1-*A* was still receiving
> writes but only by replication protocol, as I explained before.
>
> At 3.1.1-*A* server, disk stats decreased to acceptable values, so a few
> weeks after we pointed applications back to it in order to confirm whether
> the problem is related to write requests sent from our application or not.
> Read I/O times did not increase this time. Instead, 3.1.1-B (which handled
> application traffic for a few weeks), started to show the same behaviour,
> despite it was not handling requests from applications.
>
> It feels like some fragmentation is occurring, but filesystem (ext4) shows
> none.
>
> Some changes we've made since problem started:
>
>    - Upgraded kernel from 4.15.0-55-generic to 5.4.0-88-generic
>    - Upgraded ubuntu from 18.04 to 20.04
>    - Deleted _global_changes database from couchdb3.1.1-A
>
>
> More info:
>
>    - Couchdb is using docker local-persist (
>    https://github.com/MatchbookLab/local-persist) volumes.
>    - Disks are WD Purple for 2.3.1 couchdbs and WD Black for 3.1.1
>    couchdbs.
>    - We have only one database of 88GiB and 2 views: one of 22GB and a
>    little one of 30MB (highly updated)
>    - docker stats shows that couchdb3.1.1 uses lot of memory compared to
>    2.3.1:
>    - 2.5GiB for couchdb3.1.1-A (not receiving direct write requests)
>    - 5.0GiB for couchdb3.1.1-B (receiving both read and write requests)
>    - 900MiB for 2.3.1-A
>    - 800MiB for 2.3.1-B
>    - Database compaction is run at night. Problem only occurs over day,
>    when most of the writes are made.
>    - Most of the config is default.
>    - Latency graph from munin monitoring attached (at the peak, there is
>    an outage of the server caused by a kernel upgrade that went wrong)
>
>
> Any help is appreciated.
>
> --
> --
>
> *Roberto E. Iglesias*
>

Reply via email to