Hello! 1st - great work guys! Dealign with LXC and even LXD makes me miss my old good OpenVZ box because of tech excellence! Keep going! 2nd - my 2 cents for content - I"m not a native speaker, but still suggest some small fixes.
On Thu, Jul 23, 2020 at 9:52 PM Konstantin Khorenko <[email protected]> wrote: > On 07/22/2020 03:04 PM, Daniel Pearson wrote: > > >> b) you can disable tcache for this Container > >> memcg::memory.disable_cleancache > >> (raise your hand if you wish me to explain what tcache is) > > > > I'm all for additional information as it can help to form proper > > opinions if you don't mind providing it. > > Hope after reading it you'll catch yourself on an idea that now you are > aware of one more > small feature which makes VZ is really cool and that there are a lot of > things which > just work somewhere in the background simply (and silently) making it > possible for you > to utilize the hardware at maximum. :) > > Tcache > ====== > > Brief tech explanation: > ======================= > Transcendent file cache (tcache) is a driver for cleancache > https://www.kernel.org/doc/html/v4.18/vm/cleancache.html , > which stores reclaimed pages in memory unmodified. Its purpose it to > adopt pages evicted from a memory cgroup on _local_ pressure (inside a > Container), > so that they can be fetched back later without costly disk accesses. > > Detailed explanation: > ===================== > Tcache is intended increase the overall Hardware Node performance only > Intented "to" > on undercommitted Nodes, i.e. sum of all Containers memory limits on the > Node > i.e. "where total sum of all Containers memory limit values placed on the Node" > is less than Hardware Node RAM size. > > Imagine a situation: you have a Node with 1Tb of RAM, > you run 500 Containers on it limited by 1Gb of memory each (no swap for > simplicity). > Let's consider Container to be more or less identical, similar load, > similar activity inside. > => normally those Containers must use 500Gb of physical RAM at max, right, > and 500Gb will be just free on the Node. > > You think it's simple situation - ok, the node is underloaded, let's put > more Containers there, > but that's not always true - it depends on what is the bottleneck on the > Node, > which depends on real workload of Containers running on the Node. > But most often in real life - the disk becomes the bottleneck first, not > the RAM, not the CPU. > > Example: let's assume all those Containers run, say, cPanel, which by > default collect some stats > every, say, 15 minutes - the stat collection process is run via crontab. > > (Side note: randomizing times of crontab jobs - is a good idea, but who > usually does this > for Containers? We did it for application templates we shipped in > Virtuozzo, but lot of > software is just installed and configured inside Containers, we cannot do > this. And often > Hosting Providers are not allowed to touch data in Containers - so most > often cron jobs are > not randomized.) > > Ok, it does not matter how, but let's assume we get such a workload - > every, say, 15 minutes > (it's important that data access it quite rare), each Container accesses > many small files, > let it be just 100 small files to gather stats and save it somewhere. > In 500 Containers. Simultaneously. > In parallel with other regular i/o workload. > On HDDs. > > It's nightmare for disk subsystem, you know, if an HDD provides 100 IOPS, > it will take 50000/100/60 = 8.(3) minutes(!) to handle. > OK, there could be RAID, let it is able to handle 300 IOPS, it results in > 2.(7) minutes, and we forgot about other regular i/o, > so it means every 15 minutes, the Node became almost unresponsive for > several minutes > until it handles all that random i/o generated by stats collection. > > You can ask - but why _every_ 15 minutes? You've read once a file and it > resides in the > Container pagecache! > That's true, but here comes _15 minutes_ period. The larger period - the > worse. > If a Container is active enough, it just reads more and more files - > website data, > pictures, video clips, files of a fileserver, don't know. > The thing is in 15 minutes it's quite possible a Container reads more than > its RAM limit > (remember - only 1Gb in our case!), and thus all old pagecache is dropped, > substituted > with the fresh one. > And thus in 15 minutes it's quite possible you'll have to read all those > 100 files in each > Container from disk. > > And here comes tcache to save us: let's don't completely drop pagecache > which is > reclaimed from a Container (on local(!) reclaim), but save this pagecache > in > a special cache (tcache) on the Host in case there is free RAM on the Host. > > And in 15 minutes when all Containers start to access lot of small files > again - > those files data will be get back into Container pagecache without reading > from > physical disk - viola, we saves IOPS, no Node stuck anymore. > > Q: can a Container be so active (i.e. read so much from disk) that this > "useful" > pagecache is dropped even from tcache. > missing question mark - ? > A: Yes. But tcache extends the "safe" period. > > Q: mainstream? LXC/Proxmox? > A: No, it's Virtuozzo/OpenVZ specific. > "cleancache" - the base for tcache it in mainstream, it's used for Xen. > But we (VZ) wrote a driver for it and use it for Containers as well. > > Q: i use SSD, not HDD, does tcache help me? > A: SSD can provide much more IOPS, thus the Node's performance increase > caused by tcache > is less, but still reading from RAM (tcache is in RAM) is faster than > reading from SSD. > is less "significant" > > > >> c) you can limit the max amount of memory which can be used for > >> pagecache for this Container > >> memcg::memory.cache.limit_in_bytes > > > > This seems viable to test as well. Currently it seems to be utilizing a > > high number 'unlimited' default. I assume the only way to set this is to > > directly interact with the memory cgroup and not via a standard ve > > config value? > > Yes, you are right. > We use this setting for some internal system cgroups running processes > which are known to generate a lot of pagecache which won't be used later > for sure. > > From my perspective it's not fair to apply such a setting to a Container > globally - well, CT owner pay for an amount of RAM, it should be able to > use > this RAM for whatever he wants to - even for pagecache, > so limiting the pagecache for a Container is not a tweak we is advised to > be used > against a Container => no standard config parameter. > > Note: disabling tcache for a Container is completely fair, > you disable just an optimization for the whole Hardware Node performance, > but all RAM configured for a Container - is still available to the > Container. > (but also no official config value for that - most often it helps, not > hurts) > > > > I assume regardless if we utilized vSwap or not, we would likely still > > experience these additional swapping issues, presumably from pagecache > > applications, or would the usage of vSwap intercept some of these items > > thus preventing them from being swapped to disk? > > vSwap - is the optimization for swapping process _local to a Container_, > it can prevent some Container anonymous pages to be written to the > physical swap, > if _local_ Container reclaim decides to swapout something. > > At the moment you experience swapping on the Node level. > Even if some Container's processes are put to the physical swap, > it's a decision of the global reclaim mechanism, > so it's completely unrelated to vSwap => > even if you assign some swappages to Containers and thus enable vSwap for > those Containers, > i should not influence anyhow on global Node level memory pressure and > will not result in any difference in the swapping rate into physical swap. > > Hope that helps. > > -- > Best regards, > > Konstantin Khorenko, > Virtuozzo Linux Kernel Team > _______________________________________________ > Users mailing list > [email protected] > https://lists.openvz.org/mailman/listinfo/users > -- Best regards, [COOLCOLD-RIPN]
_______________________________________________ Users mailing list [email protected] https://lists.openvz.org/mailman/listinfo/users
