On Thu, Feb 11, 2021 at 12:59 AM Сергей Мамонов <[email protected]> wrote: > > And after migrate all containers to another node it still shows 63745 cgroups > - > > cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpuset 7 2 1 > cpu 10 2 1 > cpuacct 10 2 1 > memory 2 63745 1
Looks like a leakage (or a bug in memory accounting which prevents cgroup from being released). You can check the number of memory cgroups with something like find /sys/fs/cgroup/memory -type d | wc -l If you see a large number, go explore those cgroups (check cgroup.procs, memory.usage_in_bytes). > devices 11 2 1 > freezer 17 2 1 > net_cls 12 2 1 > blkio 1 4 1 > perf_event 13 2 1 > hugetlb 14 2 1 > pids 3 68 1 > ve 6 1 1 > beancounter 4 3 1 > net_prio 12 2 1 > > On Wed, 10 Feb 2021 at 18:47, Сергей Мамонов <[email protected]> wrote: >> >> And it is definitely it - >> grep -E "memory|num_cgroups" /proc/cgroups >> #subsys_name hierarchy num_cgroups enabled >> memory 2 65534 1 >> >> After migration some of containers to another node num_cgroups goes down to >> 65365 and it allowed to start stopped container without ` >> Can't create directory /sys/fs/cgroup/memory/machine.slice/1000133882: >> Cannot allocate memory` error. >> >> But I don't understand why num_cgroups for memory so big, yet. >> >> Like ~460 per container instead of 60 and less per container on other nodes >> (with the same kernel version too). >> >> On Wed, 10 Feb 2021 at 17:48, Сергей Мамонов <[email protected]> wrote: >>> >>> Hello! >>> >>> Looks like we reproduced this problem too. >>> >>> kernel - 3.10.0-1127.18.2.vz7.163.46 >>> >>> Same error - >>> Can't create directory /sys/fs/cgroup/memory/machine.slice/1000133882: >>> Cannot allocate memory >>> >>> Same ok output for >>> /sys/fs/cgroup/memory/*limit_in_bytes >>> /sys/fs/cgroup/memory/machine.slice/*limit_in_bytes >>> >>> Have a lot of free memory on node (per numa too). >>> >>> Only that looks really strange - >>> grep -E "memory|num_cgroups" /proc/cgroups >>> #subsys_name hierarchy num_cgroups enabled >>> memory 2 65534 1 >>> >>> huge nub_cgroups only on this node >>> >>> cat /proc/cgroups >>> #subsys_name hierarchy num_cgroups enabled >>> cpuset 7 144 1 >>> cpu 10 263 1 >>> cpuacct 10 263 1 >>> memory 2 65534 1 >>> devices 11 1787 1 >>> freezer 17 144 1 >>> net_cls 12 144 1 >>> blkio 1 257 1 >>> perf_event 13 144 1 >>> hugetlb 14 144 1 >>> pids 3 2955 1 >>> ve 6 143 1 >>> beancounter 4 143 1 >>> net_prio 12 144 1 >>> >>> On Thu, 28 Jan 2021 at 14:22, Konstantin Khorenko <[email protected]> >>> wrote: >>>> >>>> May be you hit memory shortage in a particular NUMA node only, for example. >>>> >>>> # numactl --hardware >>>> # numastat -m >>>> >>>> >>>> Or go hard way - trace kernel where exactly do we get -ENOMEM: >>>> >>>> trace the kernel function cgroup_mkdir() using /sys/kernel/debug/tracing/ >>>> with function_graph tracer. >>>> >>>> >>>> https://lwn.net/Articles/370423/ >>>> >>>> -- >>>> Best regards, >>>> >>>> Konstantin Khorenko, >>>> Virtuozzo Linux Kernel Team >>>> >>>> On 01/28/2021 12:43 PM, Joe Dougherty wrote: >>>> >>>> I checked that, doesn't appear to be the case. >>>> >>>> # pwd >>>> /sys/fs/cgroup/memory >>>> # cat *limit_in_bytes >>>> 9223372036854771712 >>>> 9223372036854767616 >>>> 2251799813685247 >>>> 2251799813685247 >>>> 9223372036854771712 >>>> 9223372036854771712 >>>> 9223372036854771712 >>>> # cat *failcnt >>>> 0 >>>> 0 >>>> 0 >>>> 0 >>>> 0 >>>> >>>> # pwd >>>> /sys/fs/cgroup/memory/machine.slice >>>> # cat *limit_in_bytes >>>> 9223372036854771712 >>>> 9223372036854767616 >>>> 9223372036854771712 >>>> 9223372036854771712 >>>> 9223372036854771712 >>>> 9223372036854771712 >>>> 9223372036854771712 >>>> # cat *failcnt >>>> 0 >>>> 0 >>>> 0 >>>> 0 >>>> 0 >>>> >>>> >>>> >>>> On Thu, Jan 28, 2021 at 2:47 AM Konstantin Khorenko >>>> <[email protected]> wrote: >>>>> >>>>> Hi Joe, >>>>> >>>>> i'd suggest to check memory limits for root and "machine.slice" memory >>>>> cgroups >>>>> >>>>> /sys/fs/cgroup/memory/*limit_in_bytes >>>>> /sys/fs/cgroup/memory/machine.slice/*limit_in_bytes >>>>> >>>>> All of them should be unlimited. >>>>> >>>>> If not - search who limit them. >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> Konstantin Khorenko, >>>>> Virtuozzo Linux Kernel Team >>>>> >>>>> On 01/27/2021 10:28 PM, Joe Dougherty wrote: >>>>> >>>>> I'm running into an issue on only 1 of my OpenVZ 7 nodes where it's >>>>> unable to create a directory on /sys/fs/cgroup/memory/machine.slice due >>>>> to "Cannot allocate memory" whenever I try to start a new container or >>>>> restart and existing one. I've been trying to research this but I'm >>>>> unable to find any concrete info on what could cause this. It appears to >>>>> be memory related because sometimes if I issue "echo 1 >>>>> /proc/sys/vm/drop_caches" it allows me to start a container (this only >>>>> works sometimes) but my RAM usage is extremely low with no swapping >>>>> (swappiness even set to 0 for testing). Thank you in advance for your >>>>> help. >>>>> >>>>> >>>>> Example: >>>>> # vzctl start 9499 >>>>> Starting Container ... >>>>> Mount image: /vz/private/9499/root.hdd >>>>> Container is mounted >>>>> Can't create directory /sys/fs/cgroup/memory/machine.slice/9499: Cannot >>>>> allocate memory >>>>> Unmount image: /vz/private/9499/root.hdd (190) >>>>> Container is unmounted >>>>> Failed to start the Container >>>>> >>>>> >>>>> Node Info: >>>>> Uptime: 10 days >>>>> OS: Virtuozzo 7.0.15 >>>>> Kernel: 3.10.0-1127.18.2.vz7.163.46 GNU/Linux >>>>> System Load: 3.1 >>>>> /vz Usage: 56% of 37T >>>>> Swap Usage: 0% >>>>> RAM Free: 84% of 94.2GB >>>>> >>>>> # free -m >>>>> total used free shared >>>>> buff/cache available >>>>> Mem: 96502 14259 49940 413 32303 >>>>> 80990 >>>>> Swap: 32767 93 32674 >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> [email protected] >>>>> https://lists.openvz.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> [email protected] >>>>> https://lists.openvz.org/mailman/listinfo/users >>>> >>>> >>>> >>>> -- >>>> -Joe Dougherty >>>> Chief Operating Officer >>>> Secure Dragon LLC >>>> www.SecureDragon.net >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> https://lists.openvz.org/mailman/listinfo/users >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> https://lists.openvz.org/mailman/listinfo/users >>> >>> >>> >>> -- >>> Best Regards, >>> Sergei Mamonov >> >> >> >> -- >> Best Regards, >> Sergei Mamonov > > > > -- > Best Regards, > Sergei Mamonov > _______________________________________________ > Users mailing list > [email protected] > https://lists.openvz.org/mailman/listinfo/users _______________________________________________ Users mailing list [email protected] https://lists.openvz.org/mailman/listinfo/users
