Re: High memory consumption of a single node CouchDB server

Jérôme Augé Fri, 14 Jun 2019 07:53:10 -0700

I tried the following, but it seems to fail on the first command:

--8<--
# /opt/couchdb/bin/remsh
Erlang/OTP 18 [erts-7.3] [source-d2a6d81] [64-bit] [smp:8:8]
[async-threads:10] [hipe] [kernel-poll:false]


Eshell V7.3  (abort with ^G)
(couchdb@127.0.0.1)1> MQSizes2 = lists:map(fun(A) -> {_,B} =
process_info(A,message_queue_len), {B,A} end, processes()).
** exception error: no match of right hand side value undefined
-->8--


Le ven. 14 juin 2019 à 16:08, Vladimir Ralev <vladimir.ra...@gmail.com> a
écrit :

> Hey guys. I bet it's a mailbox leaking memory. I am very interested in
> debugging issues like this too.
>
> I can suggest to get an erlang shell and run these commands to see the top
> memory consuming processes
> https://www.mail-archive.com/user@couchdb.apache.org/msg29365.html
>
> One issue I will be reporting soon is if one of your nodes is down for some
> amount of time, it seems like all databases independently try and retry to
> query the missing node and fail, resulting in printing a lot of logs for
> each db which can overwhelm the logger process. If you have a lot of DBs
> this makes the problem worse, but it doesn't happen right away for some
> reason.
>
> On Fri, Jun 14, 2019 at 4:25 PM Adrien Vergé <adrien.ve...@tolteck.com>
> wrote:
>
> > Hi Jérôme and Adam,
> >
> > That's funny, because I'm investigating the exact same problem these
> days.
> > We have a two CouchDB setups:
> > - a one-node server (q=2 n=1) with 5000 databases
> > - a 3-node cluster (q=2 n=3) with 50000 databases
> >
> > ... and we are experiencing the problem on both setups. We've been having
> > this problem for at least 3-4 months.
> >
> > We've monitored:
> >
> > - The number of open files: it's relatively low (both the system's total
> > and or fds opened by beam.smp).
> >   https://framapic.org/wQUf4fLhNIm7/oa2VHZyyoPp9.png
> >
> > - The usage of RAM, total used and used by beam.smp
> >   https://framapic.org/DBWIhX8ZS8FU/MxbS3BmO0WpX.png
> >   It continuously grows, with regular spikes, until killing CouchDB with
> an
> > OOM. After restart, the RAM usage is nice and low, and no spikes.
> >
> > - /_node/_local/_system metrics, before and after restart. Values that
> > significantly differ (before / after restart) are listed here:
> >   - uptime (obviously ;-))
> >   - memory.processes : + 3732 %
> >   - memory.processes_used : + 3735 %
> >   - memory.binary : + 17700 %
> >   - context_switches : + 17376 %
> >   - reductions : + 867832 %
> >   - garbage_collection_count : + 448248 %
> >   - words_reclaimed : + 112755 %
> >   - io_input : + 44226 %
> >   - io_output : + 157951 %
> >
> > Before CouchDB restart:
> > {
> >   "uptime":2712973,
> >   "memory":{
> >     "other":7250289,
> >     "atom":512625,
> >     "atom_used":510002,
> >     "processes":1877591424,
> >     "processes_used":1877504920,
> >     "binary":177468848,
> >     "code":9653286,
> >     "ets":16012736
> >   },
> >   "run_queue":0,
> >   "ets_table_count":102,
> >   "context_switches":1621495509,
> >   "reductions":968705947589,
> >   "garbage_collection_count":331826928,
> >   "words_reclaimed":269964293572,
> >   "io_input":8812455,
> >   "io_output":20733066,
> >   ...
> >
> > After CouchDB restart:
> > {
> >   "uptime":206,
> >   "memory":{
> >     "other":6907493,
> >     "atom":512625,
> >     "atom_used":497769,
> >     "processes":49001944,
> >     "processes_used":48963168,
> >     "binary":997032,
> >     "code":9233842,
> >     "ets":4779576
> >   },
> >   "run_queue":0,
> >   "ets_table_count":102,
> >   "context_switches":1015486,
> >   "reductions":111610788,
> >   "garbage_collection_count":74011,
> >   "words_reclaimed":239214127,
> >   "io_input":19881,
> >   "io_output":13118,
> >   ...
> >
> > Adrien
> >
> > Le ven. 14 juin 2019 à 15:11, Jérôme Augé <jerome.a...@anakeen.com> a
> > écrit :
> >
> > > Ok, so I'll setup a cron job to journalize (every minute?) the output
> > from
> > > "/_node/_local/_system" and wait for the next OOM kill.
> > >
> > > Any property from "_system" to look for in particular?
> > >
> > > Here is a link to the memory usage graph:
> > > https://framapic.org/IzcD4Y404hlr/06rm0Ji4TpKu.png
> > >
> > > The memory usage varies, but the general trend is to go up with some
> > > regularity over a week until we reach OOM. When "beam.smp" is killed,
> > it's
> > > reported as consuming 15 GB (as seen in the kernel's OOM trace in
> > syslog).
> > >
> > > Thanks,
> > > Jérôme
> > >
> > > Le ven. 14 juin 2019 à 13:48, Adam Kocoloski <kocol...@apache.org> a
> > > écrit :
> > >
> > > > Hi Jérôme,
> > > >
> > > > Thanks for a well-written and detailed report (though the mailing
> list
> > > > strips attachments). The _system endpoint provides a lot of useful
> data
> > > for
> > > > debugging these kinds of situations; do you have a snapshot of the
> > output
> > > > when the system was consuming a lot of memory?
> > > >
> > > >
> > > >
> > >
> >
> http://docs.couchdb.org/en/stable/api/server/common.html#node-node-name-system
> > > >
> > > > Adam
> > > >
> > > > > On Jun 14, 2019, at 5:44 AM, Jérôme Augé <jerome.a...@anakeen.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm having a hard time figuring out the high memory usage of a
> > CouchDB
> > > > server.
> > > > >
> > > > > What I'm observing is that the memory consumption from the
> "beam.smp"
> > > > process gradually rises until it triggers the kernel's OOM
> > > (Out-Of-Memory)
> > > > which kill the "beam.smp" process.
> > > > >
> > > > > It also seems that many databases are not compacted: I've made a
> > script
> > > > to iterate over the databases to compute de fragmentation factor, and
> > it
> > > > seems I have around 2100 databases with a frag > 70%.
> > > > >
> > > > > We have a single CouchDB v2.1.1server (configured with q=8 n=1) and
> > > > around 2770 databases.
> > > > >
> > > > > The server initially had 4 GB of RAM, and we are now with 16 GB w/
> 8
> > > > vCPU, and it still regularly reaches OOM. From the monitoring I see
> > that
> > > > with 16 GB the OOM is almost triggered once per week (c.f. attached
> > > graph).
> > > > >
> > > > > The memory usage seems to increase gradually until it reaches OOM.
> > > > >
> > > > > The Couch server is mostly used by web clients with the PouchDB JS
> > API.
> > > > >
> > > > > We have ~1300 distinct users and by monitoring the netstat/TCP
> > > > established connections I guess we have around 100 (maximum) users at
> > any
> > > > given time. From what I understanding of the application's logic,
> each
> > > user
> > > > access 2 private databases (read/write) + 1 common database
> > (read-only).
> > > > >
> > > > > On-disk usage of CouchDB's data directory is around 40 GB.
> > > > >
> > > > > Any ideas on what could cause such behavior (increasing memory
> usage
> > > > over the course of a week)? Or how to find what is happening behind
> the
> > > > scene?
> > > > >
> > > > > Regards,
> > > > > Jérôme
> > > >
> > >
> >
>

Re: High memory consumption of a single node CouchDB server

Reply via email to