Ok, so I'll setup a cron job to journalize (every minute?) the output from
"/_node/_local/_system" and wait for the next OOM kill.

Any property from "_system" to look for in particular?

Here is a link to the memory usage graph:
https://framapic.org/IzcD4Y404hlr/06rm0Ji4TpKu.png

The memory usage varies, but the general trend is to go up with some
regularity over a week until we reach OOM. When "beam.smp" is killed, it's
reported as consuming 15 GB (as seen in the kernel's OOM trace in syslog).

Thanks,
Jérôme

Le ven. 14 juin 2019 à 13:48, Adam Kocoloski <[email protected]> a écrit :

> Hi Jérôme,
>
> Thanks for a well-written and detailed report (though the mailing list
> strips attachments). The _system endpoint provides a lot of useful data for
> debugging these kinds of situations; do you have a snapshot of the output
> when the system was consuming a lot of memory?
>
>
> http://docs.couchdb.org/en/stable/api/server/common.html#node-node-name-system
>
> Adam
>
> > On Jun 14, 2019, at 5:44 AM, Jérôme Augé <[email protected]>
> wrote:
> >
> > Hi,
> >
> > I'm having a hard time figuring out the high memory usage of a CouchDB
> server.
> >
> > What I'm observing is that the memory consumption from the "beam.smp"
> process gradually rises until it triggers the kernel's OOM (Out-Of-Memory)
> which kill the "beam.smp" process.
> >
> > It also seems that many databases are not compacted: I've made a script
> to iterate over the databases to compute de fragmentation factor, and it
> seems I have around 2100 databases with a frag > 70%.
> >
> > We have a single CouchDB v2.1.1server (configured with q=8 n=1) and
> around 2770 databases.
> >
> > The server initially had 4 GB of RAM, and we are now with 16 GB w/ 8
> vCPU, and it still regularly reaches OOM. From the monitoring I see that
> with 16 GB the OOM is almost triggered once per week (c.f. attached graph).
> >
> > The memory usage seems to increase gradually until it reaches OOM.
> >
> > The Couch server is mostly used by web clients with the PouchDB JS API.
> >
> > We have ~1300 distinct users and by monitoring the netstat/TCP
> established connections I guess we have around 100 (maximum) users at any
> given time. From what I understanding of the application's logic, each user
> access 2 private databases (read/write) + 1 common database (read-only).
> >
> > On-disk usage of CouchDB's data directory is around 40 GB.
> >
> > Any ideas on what could cause such behavior (increasing memory usage
> over the course of a week)? Or how to find what is happening behind the
> scene?
> >
> > Regards,
> > Jérôme
>

Reply via email to