...and the big challenge is - how do you apply this to memory usage?Oh, you could. But the general concept behind my unquoted list is a renewing resource. Network throughput is renewing. Network bandwidth usually isn't. With swapping, you can turn memory into cache and locality to the cpu is a renewable resource.
Yep, that was my thought too. Memory seems like a static resource, so consider RSS used per second the renewable resource. Then you could charge tokens as normal.
However, there are some tricky questions:
1) who do you charge shared memory (binaries etc) to ?
2) do you count mmap()'d regions in the buffercache?
3) if a process is sitting idle, but there is no VM contention, then they are "using" that memory more, so maybe they are using more "fast memory" tokens - but they might not really be occupying it, because it is not active.
Maybe the thing with memory is that it's not important about how much is used per second, but more about how much active memory you are *displacing* per second into other places.
We can find out from the VM subsystem how much RAM is displaced into swap by a context / process. It might also be possible for the MMU to report how much L2/L3 cache is displaced during a given slice. I have a hunch that the best solution to the memory usage problem will have to take into account the multi-tiered nature of memory. So, I think it would be excellent to be able to penalise contexts that thrash the L3 cache. Systems with megabytes of L3 cache were designed to keep the most essential parts of most of the run queue hot - programs that thwart this by being bulky and excessively using pointers waste that cache.
And then, it needs to all be done with no more than a few hundred cycles every reschedule. Hmm.
Here's a thought about an algorithm that might work. This is all speculation without much regard to the existing implementations out
there, of course. Season with grains of salt to taste.
Each context is assigned a target RSS and VM size. Usage is counted a la disklimits (Herbert - is this already done?), but all complex recalculation happens when somethings tries to swap something else out.
As well as memory totals, each context also has a score that tracks how good or bad they've been with memory. Let's call that the "Jabba" value.
When swap displacement occurs, it is first taken from disproportionately fat jabbas that are running on nearby CPUs (for NUMA). Displacing other's memory makes your context a fatter jabba too, but taking from jabbas that are already fat is not as bad as taking it from a hungry jabba. When someone takes your memory, that makes you a thinner jabba.
This is not the same as simply a ratio of your context's memory usage to the allocated amount. Depending on the functions used to alter the jabba value, it should hopefully end up measuring something more akin to the amount of system memory turnover a context is inducing. It might also need something to act as a damper to pull a context's jabba nearer towards the zero point during lulls of VM activity.
Then, if you are a fat jabba, maybe you might end up getting rescheduled instead of getting more memory whenever you want it! -- Sam Vilain, sam /\T vilain |><>T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) _______________________________________________ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver