On Thu, 17 Jun 2021 08:53:55 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:

>> Proposal to add a Linux+glibc-only jcmd to manually induce malloc_trim(3) in 
>> the VM process.
>> 
>> 
>> The glibc is somewhat notorious for retaining released C Heap memory: 
>> calling free(3) returns memory to the glibc, and most libc variants will 
>> return at least a portion of it back to the Operating System, but the glibc 
>> often does not.
>> 
>> This depends on the granularity of the allocations and a number of other 
>> factors, but we found that many small allocations in particular may cause 
>> the process heap segment (hence RSS) to get bloaty. This can cause the VM to 
>> not recover from C-heap usage spikes.
>> 
>> The glibc offers an API, "malloc_trim", which can be used to cause the glibc 
>> to return free'd memory back to the Operating System.
>> 
>> This may cost performance, however, and therefore I hesitate to call 
>> malloc_trim automatically. That may be an idea for another day.
>> 
>> Instead of an automatic trim I propose to add a jcmd which allows to 
>> manually trigger a libc heap trim. Such a command would have two purposes:
>> - when analyzing cases of high memory footprint, it allows to distinguish 
>> "real" footprint, e.g. leaks, from a cases where the glibc just holds on to 
>> memory
>> - as a stop gap measure it allows to release pressure from a high footprint 
>> scenario.
>> 
>> Note that this command also helps with analyzing libc peaks which had 
>> nothing to do with the VM - e.g. peaks created by customer code which just 
>> happens to share the same process as the VM. Such memory does not even have 
>> to show up in NMT.
>> 
>> I propose to introduce this command for Linux only. Other OSes (apart maybe 
>> AIX) do not seem to have this problem, but Linux is arguably important 
>> enough in itself to justify a Linux specific jcmd.
>> 
>> If this finds agreement, I will file a CSR.
>> 
>> Note that an alternative to a Linux-only jcmd would be a command which would 
>> trim the C-heap on all platforms, with implementations to be filled out 
>> later.
>> 
>> =========
>> 
>> This patch:
>> 
>> - introduces a new jcmd, "VM.trim_libc_heap", no arguments, which trims the 
>> glibc heap on glibc platforms.
>> - includes a (rather basic) test
>> - the command calls malloc_trim(3), and additionally prints out its effect 
>> (changes caused in virt size, rss and swap space)
>> - I refactored some code in os_linux.cpp to factor out scanning 
>> /proc/self/status to get kernel memory information.
>> 
>> =========
>> 
>> Example:
>> 
>> A programm causes a temporary peak in C-heap usage (in this case, triggered 
>> via Unsafe.allocateMemory), right away frees the memory again, so its not 
>> leaky. The peak in RSS was ~8G (even though the user allocation was way 
>> smaller - glibc has a lot of overhead). The effects of this peak linger even 
>> after returning that memory to the glibc:
>> 
>> 
>> 
>> thomas@starfish:~$ jjjcmd AllocCHeap VM.info | grep Resident
>> Resident Set Size: 8685896K (peak: 8685896K) (anon: 8648680K, file: 37216K, 
>> shmem: 0K)
>>                    ^^^^^^^^
>> 
>> 
>> We execute the new trim command via jcmd:
>> 
>> 
>> thomas@starfish:~$ jjjcmd AllocCHeap VM.trim_libc_heap
>> 18770:
>> Attempting trim...
>> Done.
>> Virtual size before: 28849744k, after: 28849724k, (-20k)
>> RSS before: 8685896k, after: 920740k, (-7765156k)  <<<<
>> Swap before: 0k, after: 0k, (0k)
>> 
>> 
>> It prints out reduction in virtual size, rss and swap. The virtual size did 
>> not decrease since no mappings had been unmapped by the glibc. However, the 
>> process heap was shrunk heavily by the glibc, resulting in a large drop in 
>> RSS (8.5G->900M), freeing >7G of memory:
>> 
>> 
>> thomas@starfish:~$ jjjcmd AllocCHeap VM.info | grep Resident
>> Resident Set Size: 920740K (peak: 8686004K) (anon: 883460K, file: 37280K, 
>> shmem: 0K)
>>                    ^^^^^^^
>> 
>> 
>> When the VM is started with -Xlog:os, this is also logged:
>> 
>> 
>> [139,068s][info][os] malloc_trim:
>> [139,068s][info][os] Virtual size before: 28849744k, after: 28849724k, (-20k)
>> RSS before: 8685896k, after: 920740k, (-7765156k)
>> Swap before: 0k, after: 0k, (0k)
>
> Thomas Stuefe has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Feedback Severin; renamed query function

> _Mailing list message from [David Holmes](mailto:david.hol...@oracle.com) on 
> [hotspot-runtime-dev](mailto:hotspot-runtime-...@mail.openjdk.java.net):_
> 
> On 21/06/2021 7:58 pm, Thomas Stuefe wrote:
> 
> > On Mon, 21 Jun 2021 09:15:05 GMT, Volker Simonis <simonis at openjdk.org> 
> > wrote:
> > > I like this little new diagnostic command which I think can be quite 
> > > useful in specific situations.
> > 
> > 
> > Thanks a lot Volker!
> > > However, in contrast to other reviewers, I'd rather keep this simple and 
> > > Glibc specific instead of extending it to a more general but mostly empty 
> > > command.
> > 
> > 
> > Yes, that was my thought too. Lets wait for @dholmes-ora to chime in, 
> > whether we can all agree on a glibc specific variant. I also preferred that 
> > one.
> 
> I don't totally oppose the specialised variant, but it certainly isn't
> my ideal solution. Not sure if we already have platform specific dcmds?
> This will need a CSR request anyway.
> 
> Cheers,
> David

I'll prepare the CSR and bring it to review.

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/4510

Reply via email to