We are seeing interesting behavior at high memory usage. Apologize for a long 
and detailed email.

Our ATS caches have been running for many months, and reached a point where ATS 
has allocated huge amount of memory to free list pools (we can confirm this by 
by dumping mempools). I understand that this is a known ATS 
behavior/limitation/issue, where freelist mempools once allocated are never 
reclaimed (And, that there may be a 
patch<https://cwiki.apache.org/confluence/display/TS/How+to+use+reclaimable+freelist>
 for adding reclamation support).

But my question is regarding an interesting side issue that is being observed 
when the ATS cache reach this stage. We see our ATS caches allocating large 
amount of memory for slab cache - primarily “dentry” cache. This is probably 
okay, because even though Kernel does greedy allocation for internal caches 
(page cache, dentry, inode cache etc.), all of that memory is reclaimable 
during low memory pressure. Now, the more interesting behavior is that during 
this low memory state, we are observing only one of NUMA zones is exhausting 
the pages!, and this particular ATS cache has been in this state for several 
days.
Here is snipped output from /proc/zoneinfo:

<snip>
Node 0, zone   Normal
  pages free     6320539   <<< Roughly 25GB free (Note, System has a total of 
512GB)
        min      8129
        low      10161
        high     12193
        scanned  0
        spanned  66584576
        present  65674240
    nr_free_pages 6320539
    nr_inactive_anon 71
    nr_active_anon 79274
    nr_inactive_file 1720428
    nr_active_file 4580107
    nr_unevictable 39168773
    nr_mlock     39168773
    nr_anon_pages 39239109
    nr_mapped    13298
    nr_file_pages 6309563
    nr_dirty     91
    nr_writeback 0
    nr_slab_reclaimable 4581560  <<< ~10G.
    nr_slab_unreclaimable 16047
 <snip>
Node 1, zone   Normal
  pages free     10224        <<<< Check this. It is below low watermark!!!
        min      8193
        low      10241
        high     12289
        scanned  0
        spanned  67108864
        present  66191360
    nr_free_pages 10224
    nr_inactive_anon 64
    nr_active_anon 20886
    nr_inactive_file 42840
    nr_active_file 330486
    nr_unevictable 45630255
    nr_mlock     45630255
    nr_anon_pages 45649954
    nr_mapped    2151
    nr_file_pages 374576
    nr_dirty     9
    nr_writeback 0
    nr_slab_reclaimable 11939312   <<< ~48G
    nr_slab_unreclaimable 17135
<snip>

It would appear page allocations for slab (from slabtop it is pretty much all 
dentry) is disproportionately hitting NUMA zone 1. Under these conditions, my 
guess is zone/node 1 memory will be constantly under low memory pressure, 
causing scan/reclaim of pages to constantly run. Without knowing much about 
Linux Kernel MM, I am guessing this may be suboptimal?

Please correct my (wild) assumption on why we may be observing this :
- My guess is dentry is being created for a each new “accepted”  connection 
socket.
- There is only one ACCEPT thread to handle port 80 requests in our cache 
configuration. ACCEPT thread is responsible for opening FDs for accepted socket 
connections.
- ACCEPT thread is confined to run on cpuset belonging to one NUMA zone only….
(I am connecting a lot of dots here)

Any insight will be appreciated.

thanks
Kapil






Reply via email to