On Mon, 23 May 2022 09:24:19 GMT, Severin Gehwolf <sgehw...@openjdk.org> wrote:

>> Also, I think the current PR could produce the wrong answer, if systemd is 
>> indeed running inside the container, and we have:
>> 
>> 
>> "/user.slice/user-1000.slice/session-50.scope",    // root_path
>> "/user.slice/user-1000.slice/session-3.scope",     // cgroup_path
>> 
>> 
>> The PR gives /sys/fs/cgroup/memory/user.slice/user-1000.slice/, which 
>> specifies the overall memory limit for user-1000. However, the correct 
>> answer may be 
>> /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope, which may 
>> have a smaller memory limit, and the JVM may end up allocating a larger heap 
>> than allowed.
>
> Yes, if we can decide which one the right file is. This is largely 
> undocumented territory. The correct fix is a) find the correct path to the 
> namespace hierarchy the process is a part of. b) starting at the leaf node, 
> walk up the hierarchy and find the **lowest** limits. Doing this would be 
> very expensive!
> 
> Aside: Current container detection in the JVM/JDK is notoriously imprecise. 
> It's largely based on common setups (containers like docker). The heuristics 
> assume that memory limits are reported inside the container at the leaf node. 
> If, however, that's not the case, the detected limits will be wrong (it will 
> detect it as unlimited, even though it's - for example - memory constrained 
> at the parent). This can for example be reproduced on a cgroups v2 system 
> with a systemd slice using memory limits. We've worked-around this in OpenJDK 
> for cgroups v1 by https://bugs.openjdk.java.net/browse/JDK-8217338

> Maybe we should do this instead?
> 
>     * Read /proc/self/cgroup
> 
>     * Find the `10:memory:<path>` line
> 
>     * If `/sys/fs/cgroup/memory/<path>/tasks` contains my PID, this is the 
> path
> 
>     * Otherwise, scan all `tasks` files under  `/sys/fs/cgroup/memory/`. 
> Exactly one of them contains my PID.

Something like that seems most promising, but it would have to be 
`cgroup.procs` not `tasks` as `tasks` is the task id (i.e. Linux's thread), not 
the process. We could keep the two common cases as short circuiting. I.e. host 
and docker cases in the test.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8629

Reply via email to