We are seeing interesting behavior at high memory usage. Apologize for a long and detailed email.
Our ATS caches have been running for many months, and reached a point where ATS has allocated huge amount of memory to free list pools (we can confirm this by by dumping mempools). I understand that this is a known ATS behavior/limitation/issue, where freelist mempools once allocated are never reclaimed (And, that there may be a patch<https://cwiki.apache.org/confluence/display/TS/How+to+use+reclaimable+freelist> for adding reclamation support). But my question is regarding an interesting side issue that is being observed when the ATS cache reach this stage. We see our ATS caches allocating large amount of memory for slab cache - primarily “dentry” cache. This is probably okay, because even though Kernel does greedy allocation for internal caches (page cache, dentry, inode cache etc.), all of that memory is reclaimable during low memory pressure. Now, the more interesting behavior is that during this low memory state, we are observing only one of NUMA zones is exhausting the pages!, and this particular ATS cache has been in this state for several days. Here is snipped output from /proc/zoneinfo: <snip> Node 0, zone Normal pages free 6320539 <<< Roughly 25GB free (Note, System has a total of 512GB) min 8129 low 10161 high 12193 scanned 0 spanned 66584576 present 65674240 nr_free_pages 6320539 nr_inactive_anon 71 nr_active_anon 79274 nr_inactive_file 1720428 nr_active_file 4580107 nr_unevictable 39168773 nr_mlock 39168773 nr_anon_pages 39239109 nr_mapped 13298 nr_file_pages 6309563 nr_dirty 91 nr_writeback 0 nr_slab_reclaimable 4581560 <<< ~10G. nr_slab_unreclaimable 16047 <snip> Node 1, zone Normal pages free 10224 <<<< Check this. It is below low watermark!!! min 8193 low 10241 high 12289 scanned 0 spanned 67108864 present 66191360 nr_free_pages 10224 nr_inactive_anon 64 nr_active_anon 20886 nr_inactive_file 42840 nr_active_file 330486 nr_unevictable 45630255 nr_mlock 45630255 nr_anon_pages 45649954 nr_mapped 2151 nr_file_pages 374576 nr_dirty 9 nr_writeback 0 nr_slab_reclaimable 11939312 <<< ~48G nr_slab_unreclaimable 17135 <snip> It would appear page allocations for slab (from slabtop it is pretty much all dentry) is disproportionately hitting NUMA zone 1. Under these conditions, my guess is zone/node 1 memory will be constantly under low memory pressure, causing scan/reclaim of pages to constantly run. Without knowing much about Linux Kernel MM, I am guessing this may be suboptimal? Please correct my (wild) assumption on why we may be observing this : - My guess is dentry is being created for a each new “accepted” connection socket. - There is only one ACCEPT thread to handle port 80 requests in our cache configuration. ACCEPT thread is responsible for opening FDs for accepted socket connections. - ACCEPT thread is confined to run on cpuset belonging to one NUMA zone only…. (I am connecting a lot of dots here) Any insight will be appreciated. thanks Kapil
