Hello.

Recently, for the first time, we experienced an apparent memory issue in our 
production environment. Here's an example of the relevant log messages:

Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: <core> 
[core/mem/q_malloc.c:286]: qm_find_free(): qm_find_free(0x7fd8bbcfc010, 
134648); Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: <core> 
[core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7fd8bbcfc010, 134648) 
called from xcap_server: xcap_server.c: ki_xcaps_put(549), module: xcap_server; 
Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: xcap_server 
[xcap_server.c:552]: ki_xcaps_put(): no more pkg
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2252]: ERROR: app_perl 
[kamailioxs.xs:487]: XS_Kamailio_log(): 500 Server error

The failed operation was the XCAP server module trying to generate a very large 
RLS contact list for a user. The issue only impacted users with very large 
lists, as though a large contiguous block of memory could not be found, whereas 
other smaller allocations continued to work fine. We believe the requested 
allocation was around 112 KB in size.
The server had been up for 14 days. We were able to work around the issue 
temporarily by just restarting the Kamailio service. It's unusual, because our 
production server is often up for months, and we've never seen this issue 
before. The load on production is increasing slowly due to an increased 
concurrent user count, so that might be related.

Before restarting the service on production, we captured the output of the 
following commands:

kamctl stats shmem
kamcmd mod.stats all shm
kamcmd pkg.stats
kamcmd mod.stats all pkg

Here's the shared mem output:

{
  "jsonrpc":  "2.0",
  "result": [
    "shmem:fragments = 27240",
    "shmem:free_size = 447203296",
    "shmem:max_used_size = 116175576",
    "shmem:real_used_size = 89667616",
    "shmem:total_size = 536870912",
    "shmem:used_size = 68824240"
  ],
  "id": 6934
}

Here's the pkg output for the particular PID which was throwing the errors:

{
                entry: 34
                pid: 2302
                rank: 14
                used: 2415864
                free: 4949688
                real_used: 3438920
                total_size: 8388608
                total_frags: 1951
}

We didn't see anything obvious in the stats output which explains the issue.

We've been trying to reproduce the issue in a dev environment using simulated 
higher than production load running for many days, but so far we've had no 
luck. We've been monitoring memory stats over time, but we don't see any 
obvious leaks or issues.

We've searched various past threads, but didn't find any obvious answers. Here 
are some of the documents and the threads we've been reading:

https://www.kamailio.org/wiki/tutorials/troubleshooting/memory
https://www.kamailio.org/wiki/cookbooks/3.3.x/core#mem_join
https://sr-users.sip-router.narkive.com/3TEDs3ga/tcp-free-fragment-not-found
https://lists.kamailio.org/pipermail/sr-users/2012-June/073552.html
https://lists.kamailio.org/pipermail/sr-users/2017-February/096132.html
https://lists.kamailio.org/pipermail/sr-users/2017-September/098607.html
https://github.com/kamailio/kamailio/issues/1001
https://lists.kamailio.org/pipermail/sr-users/2016-April/092592.html
https://lists.kamailio.org/pipermail/sr-users/2010-July/064832.html

Regarding our Kamailio version and build options, here's the output of 
'kamailio -v':

----
version: kamailio 5.1.0 (x86_64/linux)
flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, 
USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, 
TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, 
USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, 
MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled on 20:07:31 Jan  4 2018 with gcc 5.4.0
----

We are running on a 64-bit Ubuntu Server virtual machine.

Any help would be greatly appreciated.

Thanks very much.
_______________________________________________
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

Reply via email to