Hi Jan, thanks for taking your time to read and answer my question.
Did you account for the completely rewritten compaction daemon (smoosh)
> that has a different configuration from the one in 2.x?
Yes, I've taken this into account and adjusted my config to match what we
expect. This is related the smoosh config:
{
"from": "00:00",
"min_priority": "5368709120",
"strict_window": "true",
"to": "08:00"
},
{
"from": "00:00",
"min_priority": "3758096384",
"strict_window": "true",
"to": "08:00"
},
{
"from": "00:00",
"min_priority": "5.0",
"strict_window": "true",
"to": "08:00"
},
{
"from": "00:00",
"min_priority": "5.0",
"strict_window": "true",
"to": "08:00"
}
And it seems to be working as desired. Indeed, at the beginning we had
problems with compaction never finishing, so we ended up with this config.
And finally: which Erlang version are you running? There are a few odd ones
> out there that might affect what you’re doing.
We are using couchdb:3.1.1 docker hub image, but if it helps, this is the
name of erlang dir inside my couchdb instance:
root@mycouchdb:/opt/couchdb# ls -ld erts*
drwxr-xr-x 1 couchdb couchdb 4096 Mar 12 2021 erts-9.3.3.14
Hope it helps.
Thanks.
On Mon, Apr 18, 2022 at 7:11 AM Jan Lehnardt <[email protected]> wrote:
>
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>
> *24/7 Observation for your CouchDB Instances:
> https://opservatory.app
>
> > On 4. Apr 2022, at 18:21, Roberto Iglesias <[email protected]>
> wrote:
> >
> > Hello.
> >
> > About 1 year ago, we had two CouchDB 2.3.1 instances running inside
> Docker containers and pull-replicating one each other. This way, we could
> read from and write to any of these servers, although we generally choose
> one as the "active" server and write to it. The second server would act as
> a spare or backup.
> >
> > At this point (1y ago) we decided to migrate from CouchDB version 2.3.1
> to 3.1.1. Instead of upgrading our existing databases, we added two extra
> instances and configure pull replications in all of them until we get the
> following scenario:
> >
> > 2.3.1-A <===> 2.3.1-B <===> 3.1.1-A <===> 3.1.1-B
> >
> > where <===> represents two pull replications, one configured on each
> side. i.e: 2.3.1-A pulls from 2.3.1-B and vice versa.
> >
> > If a write is made at 2.3.1-A, it has to make it through all servers
> until it reaches 3.1.1-B.
> >
> > All of them have an exclusive HDD which is not shared with any other
> service.
> >
> > We have not a single problem with 2.3.1.
> >
> > After pointing our services to 3.1.1-A, it gradually started to increase
> Read I/O wait times over weeks until it reached peaks of 600ms (totally
> unworkable). So we stopped making write requests (http POST) to it and
> pointed all applications to 3.1.1-B. 3.1.1-A was still receiving writes but
> only by replication protocol, as I explained before.
> >
> > At 3.1.1-A server, disk stats decreased to acceptable values, so a few
> weeks after we pointed applications back to it in order to confirm whether
> the problem is related to write requests sent from our application or not.
> Read I/O times did not increase this time. Instead, 3.1.1-B (which handled
> application traffic for a few weeks), started to show the same behaviour,
> despite it was not handling requests from applications.
> >
> > It feels like some fragmentation is occurring, but filesystem (ext4)
> shows none.
> >
> > Some changes we've made since problem started:
> > • Upgraded kernel from 4.15.0-55-generic to 5.4.0-88-generic
> > • Upgraded ubuntu from 18.04 to 20.04
> > • Deleted _global_changes database from couchdb3.1.1-A
> >
> > More info:
> > • Couchdb is using docker local-persist (
> https://github.com/MatchbookLab/local-persist) volumes.
> > • Disks are WD Purple for 2.3.1 couchdbs and WD Black for 3.1.1
> couchdbs.
> > • We have only one database of 88GiB and 2 views: one of 22GB and
> a little one of 30MB (highly updated)
> > • docker stats shows that couchdb3.1.1 uses lot of memory compared
> to 2.3.1:
> > • 2.5GiB for couchdb3.1.1-A (not receiving direct write requests)
> > • 5.0GiB for couchdb3.1.1-B (receiving both read and write
> requests)
> > • 900MiB for 2.3.1-A
> > • 800MiB for 2.3.1-B
> > • Database compaction is run at night. Problem only occurs over
> day, when most of the writes are made.
>
> Did you account for the completely rewritten compaction daemon (smoosh)
> that has a different configuration from the one in 2.x?
>
> https://docs.couchdb.org/en/stable/maintenance/compaction.html#compact-auto
>
> Otherwise you might see compaction going on at all times (what we
> recommend, usually), rather than what you expect: just at night.
>
> And in general, at this point, we strongly recommend running on SSDs for
> the obvious speed benefits :)
>
> And finally: which Erlang version are you running? There are a few odd
> ones out there that might affect what you’re doing.
>
> Best
> Jan
> —
> > • Most of the config is default.
> > • Latency graph from munin monitoring attached (at the peak, there
> is an outage of the server caused by a kernel upgrade that went wrong)
> >
> > Any help is appreciated.
> >
> > --
> > --
> >
> > Roberto E. Iglesias
>
>