Re: [Vserver] cpu limits clone vservers

2004-11-24 Thread Herbert Poetzl
On Wed, Nov 24, 2004 at 05:01:47PM +1300, Sam Vilain wrote:
 Jörn Engel wrote:
 ...and the big challenge is - how do you apply this to memory usage?
 Oh, you could.  But the general concept behind my unquoted list is a
 renewing resource.  Network throughput is renewing.  Network bandwidth
 usually isn't.  With swapping, you can turn memory into cache and
 locality to the cpu is a renewable resource.
 
 Yep, that was my thought too.  Memory seems like a static resource, so
 consider RSS used per second the renewable resource.  Then you could
 charge tokens as normal.
 
 However, there are some tricky questions:
 
   1) who do you charge shared memory (binaries etc) to ?
 
   2) do you count mmap()'d regions in the buffercache?
 
   3) if a process is sitting idle, but there is no VM contention, then
  they are using that memory more, so maybe they are using more
  fast memory tokens - but they might not really be occupying it,
  because it is not active.
 
 Maybe the thing with memory is that it's not important about how much is
 used per second, but more about how much active memory you are
 *displacing* per second into other places.
 
 We can find out from the VM subsystem how much RAM is displaced into
 swap by a context / process.  It might also be possible for the MMU to

pages to be swapped out can not easily be assigned
to a context, this is different for pages getting
paged in ...

 report how much L2/L3 cache is displaced during a given slice.  I have a
 hunch that the best solution to the memory usage problem will have to
 take into account the multi-tiered nature of memory.  So, I think it
 would be excellent to be able to penalise contexts that thrash the L3
 cache.  Systems with megabytes of L3 cache were designed to keep the
 most essential parts of most of the run queue hot - programs that thwart
 this by being bulky and excessively using pointers waste that cache.
 
 And then, it needs to all be done with no more than a few hundred cycles
 every reschedule.  Hmm.
 
 Here's a thought about an algorithm that might work.  This is all 
 speculation without much regard to the existing implementations out
 there, of course.  Season with grains of salt to taste.
 
 Each context is assigned a target RSS and VM size.  Usage is counted a
 la disklimits (Herbert - is this already done?), but all complex

yep, not relative as with disklimits, but absolute
and in identical way the kernel accounts RSS and VM

 recalculation happens when somethings tries to swap something else out.
 
 As well as memory totals, each context also has a score that tracks how
 good or bad they've been with memory.  Let's call that the Jabba
 value.
 
 When swap displacement occurs, it is first taken from disproportionately
 fat jabbas that are running on nearby CPUs (for NUMA).  Displacing
 other's memory makes your context a fatter jabba too, but taking from
 jabbas that are already fat is not as bad as taking it from a hungry
 jabba.  When someone takes your memory, that makes you a thinner jabba.
 
 This is not the same as simply a ratio of your context's memory usage to
 the allocated amount.  Depending on the functions used to alter the
 jabba value, it should hopefully end up measuring something more akin to
 the amount of system memory turnover a context is inducing.  It might
 also need something to act as a damper to pull a context's jabba nearer
 towards the zero point during lulls of VM activity.
 
 Then, if you are a fat jabba, maybe you might end up getting rescheduled
 instead of getting more memory whenever you want it!

thought about a simpler approach, with a TB for the
actual page-ins, so that every page-in will consume
a token, and you get a number per interval, as usual ...

best,
Herbert

 -- 
 Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13
 (include my PGP key ID in personal replies to avoid spam filtering)
 ___
 Vserver mailing list
 [EMAIL PROTECTED]
 http://list.linux-vserver.org/mailman/listinfo/vserver
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-24 Thread Jörn Engel
On Wed, 24 November 2004 14:02:07 +0100, Herbert Poetzl wrote:
 On Wed, Nov 24, 2004 at 05:01:47PM +1300, Sam Vilain wrote:
 
 pages to be swapped out can not easily be assigned
 to a context, this is different for pages getting
 paged in ...

Or any page-fault done for the context, for that matter.  There is no
fundamental difference between swapping in, faulting in the
recently-started openoffice, mallocmemset, ...

  Here's a thought about an algorithm that might work.  This is all 
  speculation without much regard to the existing implementations out
  there, of course.  Season with grains of salt to taste.
  
  Each context is assigned a target RSS and VM size.  Usage is counted a
  la disklimits (Herbert - is this already done?), but all complex
 
 yep, not relative as with disklimits, but absolute
 and in identical way the kernel accounts RSS and VM
 
  recalculation happens when somethings tries to swap something else out.
  
  As well as memory totals, each context also has a score that tracks how
  good or bad they've been with memory.  Let's call that the Jabba
  value.
  
  When swap displacement occurs, it is first taken from disproportionately
  fat jabbas that are running on nearby CPUs (for NUMA).  Displacing
  other's memory makes your context a fatter jabba too, but taking from
  jabbas that are already fat is not as bad as taking it from a hungry
  jabba.  When someone takes your memory, that makes you a thinner jabba.
  
  This is not the same as simply a ratio of your context's memory usage to
  the allocated amount.  Depending on the functions used to alter the
  jabba value, it should hopefully end up measuring something more akin to
  the amount of system memory turnover a context is inducing.  It might
  also need something to act as a damper to pull a context's jabba nearer
  towards the zero point during lulls of VM activity.
  
  Then, if you are a fat jabba, maybe you might end up getting rescheduled
  instead of getting more memory whenever you want it!
 
 thought about a simpler approach, with a TB for the
 actual page-ins, so that every page-in will consume
 a token, and you get a number per interval, as usual ...

It misses a few corner-cases, but I cannot think of anything better.
More complicated approaches would miss different corner-cases, but
that's not a real advantage.

With the simple TB approach, any IO caused on behalf of a process
gets accounted.  Updatedb would be a huge offender, but that looks
more like a feature than a bug.

Jörn

-- 
Victory in war is not repetitious.
-- Sun Tzu
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-24 Thread Jörn Engel
On Wed, 24 November 2004 15:58:01 +1300, Sam Vilain wrote:
 
 If you increase the priority of the administrative daemons to -19, then
 you will get what people *actually* want ;-) which is CPU time for
 administration to come before all normal activity on the system.  Even
 with a load over 100 you should still be able to log in via ssh if it is
 niced that high.

Sounds interesting.  Last time I tried my forkbomb on 2.6, it took me
a full hour before I finally gave up and rebootet the system.  But
that was without any ulimit enforcement.

Jörn

-- 
Fantasy is more important than knowledge. Knowledge is limited,
while fantasy embraces the whole world.
-- Albert Einstein
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-24 Thread Gregory (Grisha) Trubetskoy

On Wed, 24 Nov 2004, Herbert Poetzl wrote:
Then, if you are a fat jabba, maybe you might end up getting rescheduled
instead of getting more memory whenever you want it!
thought about a simpler approach, with a TB for the
actual page-ins, so that every page-in will consume
a token, and you get a number per interval, as usual ...
There probably still needs to be a target size, which if exceeded, your 
bucket is refilled slower. This way small contexts would not be suffering 
because of a large and very active context. The sysadmins would need to 
make sure that the sum of all targets does not exceed physical RAM.

So you'd have two additional parameters - target size and fill-interval 
multiplier.

if (is_exceeded(target)) {
interval *= multplier;
}
Also - at which point does a malloc actually fail? It seems like context 0 
should have a priority over other contexts - a non-0 context should under 
no circumstances be able to exhaust the system memory.

May be there should be an additional level in the bucket - reschedule 
level. If I actually empty the bucket, the malloc fails?

my $0.02
Grisha
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-24 Thread Herbert Poetzl
On Wed, Nov 24, 2004 at 11:00:40AM -0500, Gregory (Grisha) Trubetskoy wrote:
 
 
 On Wed, 24 Nov 2004, Herbert Poetzl wrote:
 
 Then, if you are a fat jabba, maybe you might end up getting rescheduled
 instead of getting more memory whenever you want it!
 
 thought about a simpler approach, with a TB for the
 actual page-ins, so that every page-in will consume
 a token, and you get a number per interval, as usual ...
 
 There probably still needs to be a target size, which if exceeded, your 
 bucket is refilled slower. This way small contexts would not be suffering 
 because of a large and very active context. The sysadmins would need to 
 make sure that the sum of all targets does not exceed physical RAM.
 
 So you'd have two additional parameters - target size and fill-interval 
 multiplier.

well, actually I was more thinking of using something
similar to the scheduler TB (see the vserver paper for
details) and just do the check/dec for every 'page'

 if (is_exceeded(target)) {
   interval *= multplier;
 }
 
 Also - at which point does a malloc actually fail? It seems like context 0 
 should have a priority over other contexts - a non-0 context should under 
 no circumstances be able to exhaust the system memory.

well, malloc is something which will fail if RSS or
VM limits are reached (actually it's brk() and mmap())
but in the typical case the page TB would just cause
the context to be not scheduled (i.e. a page in would
take a _lot_ longer than usual ;)

best,
Herbert

 May be there should be an additional level in the bucket - reschedule 
 level. If I actually empty the bucket, the malloc fails?
 
 my $0.02
 
 Grisha
 ___
 Vserver mailing list
 [EMAIL PROTECTED]
 http://list.linux-vserver.org/mailman/listinfo/vserver
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


[Vserver] cpu limits clone vservers

2004-11-23 Thread Andreea Gansac
Hi,

I am testing the linux vservers development release 1.9.3. 
I have 2 problems:

1. When I check the limits with vlimit I can't see any cpu resource:
[EMAIL PROTECTED] linux-2.6.9]# vlimit -c 49167 --all -d
RSS N/A N/A 10
NPROC   N/A N/A 70
NOFILE  N/A N/A inf
MEMLOCK N/A N/A inf
AS  N/A N/A inf 

Though :

[EMAIL PROTECTED] util-vserver]# vlimit -c 49168 --cpu 30
vc_set_rlimit(): Success

If I run a process that does only while(1){} inside the vserver, the
cpu is used only 25%-30%. So the limitations work but not how I want
them to. No matter how much I set the cpu limit (10,20,50,100,200...)
and no matter how many processes I have in the vserver the cpu is used
25-30%. 
I think I missed a piece of the puzzle.

2. How can I clone a vserver? On 2.4.x I had the newvserver utility.
On 2.6 I have vbuild and vcopy but neither of them works.

[EMAIL PROTECTED] util-vserver]# ./vbuild --debug --test andreea andreea_clone
sh: line 1: /usr/lib/util-vserver/distrib-info: No such file or
directory
Can't lstat file andreea/.vdir (No such file or directory)

[EMAIL PROTECTED] util-vserver]# ./vcopy andreea_clone andreea
unification not configured for source vserver

Reading the error I get at vcopy I understand that vcopy creates vserver
using unification. I don't want unification. I want every vserver to
have it's own logical volume, thus I can limit the space for every
vserver very easy.
I think vbuild is what I want but it's not working. Is there another
utility I don't know about? Or how can I make vbuild work?

Thanks a lot.

-- 
Andreea Gansac
Web Engineer
iNES Group
tel: 021-232.21.12
fax: 021-232.34.61
www.ines.ro

___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Gregory (Grisha) Trubetskoy
On Tue, 23 Nov 2004, Andreea Gansac wrote:
[EMAIL PROTECTED] util-vserver]# vlimit -c 49168 --cpu 30
vc_set_rlimit(): Success
If I run a process that does only while(1){} inside the vserver, the
cpu is used only 25%-30%.
If I'm not mistaken, this simply sets the cpu time to 30 seconds, so after 
30 seconds of cpu time is used, processes in your context will be killed.

Take a look at this thread, it descibes what you want. (Read the whole 
thread, because the first message from me has some ommissions):

http://list.linux-vserver.org/archive/vserver/msg08134.html
Reading the error I get at vcopy I understand that vcopy creates vserver
using unification. I don't want unification. I want every vserver to
have it's own logical volume, thus I can limit the space for every
vserver very easy.
I think vbuild is what I want but it's not working. Is there another
utility I don't know about? Or how can I make vbuild work?
You can limit the space much easier using the VServer disk limits. google 
for vserver vdlimit. Basically you need xid tagging enabled in the kernel 
(under VServer menu option in kernel config, off by default), need to 
compile the vdlimit tool, then the partition on which vservers reside 
needs to be mounted with the tagxid option, then you can set a limit like 
this:

/usr/local/vdlimit-0.01/vdlimit -a -x 1 \
-S 0,10,0,1,5 /vservers
This means that for context , 0 space is presently used, 10 is 
maximum allowed, 0 inodes presently used, 1 inodes maximum allowed, 5% 
of disk space is reserved for root. Note that these limits exist only 
while the serer is up and therefore need to be saved on shutdown and 
restored on startup. The list archives have example scripts of how people 
do this.

Grisha
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Herbert Poetzl
On Tue, Nov 23, 2004 at 03:00:06PM +0200, Andreea Gansac wrote:
 Hi,
 
   I am testing the linux vservers development release 1.9.3. 
   I have 2 problems:
 
   1. When I check the limits with vlimit I can't see any cpu resource:
   [EMAIL PROTECTED] linux-2.6.9]# vlimit -c 49167 --all -d
   RSS N/A N/A 10
   NPROC   N/A N/A 70
   NOFILE  N/A N/A inf
   MEMLOCK N/A N/A inf
   AS  N/A N/A inf 

   Though :
 
   [EMAIL PROTECTED] util-vserver]# vlimit -c 49168 --cpu 30
   vc_set_rlimit(): Success

we could do CPU limits (similar to ulimit) but would
you really want to limit a vserver to, let's say 1minute
of CPU usage in total?

   If I run a process that does only while(1){} inside the vserver, the
 cpu is used only 25%-30%. So the limitations work but not how I want
 them to. No matter how much I set the cpu limit (10,20,50,100,200...)
 and no matter how many processes I have in the vserver the cpu is used
 25-30%. 
   I think I missed a piece of the puzzle.

yep, probably the hard scheduler, the token bucket
and the vsched tool ...

   2. How can I clone a vserver? On 2.4.x I had the newvserver utility.
   On 2.6 I have vbuild and vcopy but neither of them works.
   
 [EMAIL PROTECTED] util-vserver]# ./vbuild --debug --test andreea andreea_clone
 sh: line 1: /usr/lib/util-vserver/distrib-info: No such file or
 directory
 Can't lstat file andreea/.vdir (No such file or directory)
 
 [EMAIL PROTECTED] util-vserver]# ./vcopy andreea_clone andreea
 unification not configured for source vserver
 
 Reading the error I get at vcopy I understand that vcopy creates vserver
 using unification. I don't want unification. I want every vserver to
 have it's own logical volume, thus I can limit the space for every
 vserver very easy.
 I think vbuild is what I want but it's not working. Is there another
 utility I don't know about? Or how can I make vbuild work?

I think vserver name build ... is what you want ...

HTH,
Herbert

 Thanks a lot.
 
 -- 
 Andreea Gansac
 Web Engineer
 iNES Group
 tel: 021-232.21.12
 fax: 021-232.34.61
 www.ines.ro
 
 ___
 Vserver mailing list
 [EMAIL PROTECTED]
 http://list.linux-vserver.org/mailman/listinfo/vserver
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Jörn Engel
On Tue, 23 November 2004 16:44:22 +0100, Herbert Poetzl wrote:
 
 we could do CPU limits (similar to ulimit) but would
 you really want to limit a vserver to, let's say 1minute
 of CPU usage in total?

That's basically the same problem as with any shared resource
consumption.  For networking, HTB is relatively close to what most
people want and I don't see how CPU is a much different resource.

What most people want in plain English:
o Every user gets some guaranteed lower bound.
o Sum of lower bounds doesn't exceed total resources.
o Most of the time, not all resources get consumed.  Add them to the
  'leftover' pool.
o Users that demand more resources than their lower bound get serviced
  from the leftover pool.
o Users that, on average, use less resources get a higher priority
  when accessing the leftover pool.

List could be longer, but everything else is details.  Most
controversy will be over the question of how exactly to prioritize the
nicer users.  But in the end, CPU-hogs will be limited to something
close to their lower bounds and nice users operate well below but can
get a lot more power in a burst, as least sometimes.

Yeah, code doesn't exist.  The usual.

Jörn

-- 
He who knows others is wise.
He who knows himself is enlightened.
-- Lao Tsu
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Herbert Poetzl
On Tue, Nov 23, 2004 at 06:18:54PM +0100, Jörn Engel wrote:
 On Tue, 23 November 2004 16:44:22 +0100, Herbert Poetzl wrote:
  
  we could do CPU limits (similar to ulimit) but would
  you really want to limit a vserver to, let's say 1minute
  of CPU usage in total?
 
 That's basically the same problem as with any shared resource
 consumption.  For networking, HTB is relatively close to what most
 people want and I don't see how CPU is a much different resource.
 
 What most people want in plain English:
 o Every user gets some guaranteed lower bound.
 o Sum of lower bounds doesn't exceed total resources.
 o Most of the time, not all resources get consumed.  Add them to the
   'leftover' pool.
 o Users that demand more resources than their lower bound get serviced
   from the leftover pool.
 o Users that, on average, use less resources get a higher priority
   when accessing the leftover pool.
 
 List could be longer, but everything else is details.  Most
 controversy will be over the question of how exactly to prioritize the
 nicer users.  But in the end, CPU-hogs will be limited to something
 close to their lower bounds and nice users operate well below but can
 get a lot more power in a burst, as least sometimes.
 
 Yeah, code doesn't exist.  The usual.

ahem, maybe you should read up on the TokenBucket
stuff for CPU usage in linux-vserver ...

http://linux-vserver.org/Linux-VServer-Paper-06
06.3. Token Bucket Extensions

or do you mean something different? if not, then
it's already implemented ;)

best,
Herbert

PS: what about linux-vserver CoW?

 Jörn
 
 -- 
 He who knows others is wise.
 He who knows himself is enlightened.
 -- Lao Tsu
 ___
 Vserver mailing list
 [EMAIL PROTECTED]
 http://list.linux-vserver.org/mailman/listinfo/vserver
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Gregory (Grisha) Trubetskoy
On Tue, 23 Nov 2004, [iso-8859-1] J?rn Engel wrote:
What most people want in plain English:
o Every user gets some guaranteed lower bound.
o Sum of lower bounds doesn't exceed total resources.
o Most of the time, not all resources get consumed.  Add them to the
 'leftover' pool.
o Users that demand more resources than their lower bound get serviced
 from the leftover pool.
o Users that, on average, use less resources get a higher priority
 when accessing the leftover pool.
...and the big challenge is - how do you apply this to memory usage?
Grisha
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Dariush Pietrzak
 ...and the big challenge is - how do you apply this to memory usage?
oooh! I know! swap out those that are over limit to tape!

-- 
Key fingerprint = 40D0 9FFB 9939 7320 8294  05E0 BCC7 02C4 75CC 50D9
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Matthew Nuzum




On Tue, 2004-11-23 at 12:47 -0500, Gregory (Grisha) Trubetskoy wrote:


On Tue, 23 Nov 2004, [iso-8859-1] J?rn Engel wrote:

 What most people want in plain English:
 o Every user gets some guaranteed lower bound.
 o Sum of lower bounds doesn't exceed total resources.
 o Most of the time, not all resources get consumed.  Add them to the
  'leftover' pool.
 o Users that demand more resources than their lower bound get serviced
  from the leftover pool.
 o Users that, on average, use less resources get a higher priority
  when accessing the leftover pool.

...and the big challenge is - how do you apply this to memory usage?

Grisha



This would be a cool thing. We could squabble over the details, but if it did what you said above and left some room for tweaking I'll bet people would be [even more] pleased [though we are already ecstatic now] with the vserver project. Of course, I'm still using CTX 17, so I'm pretty easy to please I guess.

I'd be curious to know what happens when there is contention for that pool of RAM. I've got a nightly batch job that lasts 15 minutes but uses most of the server's ram during the process. Right now, everything works OK, but I suspect under this vserver panacea edition I would have problems because idle vservers will be allocated their minimum ram even though they don't need it.

I guess I could just allocate 4MB of RAM as the minimum or some other small number to get the effect of what I do now... still, a bit thought provoking.

Keep up the interesting conversation and work,




-- 
Matthew Nuzum | Makers of Elite Content Management System
www.followers.net | View samples of Elite CMS in action
[EMAIL PROTECTED] | http://www.followers.net/portfolio/







___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Jörn Engel
On Tue, 23 November 2004 19:08:50 +0100, Jörn Engel wrote:
 
 I love it when someone else already did the work. ;)

Except when it's only partial.  If implementation matches
documentation, the fixed lower bound is 0 (zero).  That's pretty low.
Most people want to say something like Ssh will always get 5% of cpu,
no matter how many forkbombs explode.  And the administrator's shell
will inherit those 5%.

Ok, not many people know they want to say it, but some may learn the
hard way over time. ;)

Jörn

PS:
[EMAIL PROTECTED]:/tmp cat _
head __
. _. _
[EMAIL PROTECTED]:/tmp . _

Have fun!

-- 
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Herbert Poetzl
On Tue, Nov 23, 2004 at 07:39:26PM +0100, Jörn Engel wrote:
 On Tue, 23 November 2004 19:08:50 +0100, Jörn Engel wrote:
  
  I love it when someone else already did the work. ;)
 
 Except when it's only partial.  If implementation matches
 documentation, the fixed lower bound is 0 (zero).  That's pretty low.
 Most people want to say something like Ssh will always get 5% of cpu,
 no matter how many forkbombs explode.  And the administrator's shell
 will inherit those 5%.
 
 Ok, not many people know they want to say it, but some may learn the
 hard way over time. ;)

yep, but taking care that overbooking doesn't
happen can be done in userspace, literally ...

so a 'minimum' of available resources can be
guaranteeed only if you limit all other contexts
to 1.0 - Sum[max], which in turn, is sufficient

unless you have 'better' suggestions to solve
this ...

best,
Herbert

 Jörn
 
 PS:
 [EMAIL PROTECTED]:/tmp cat _
 head __
 . _. _
 [EMAIL PROTECTED]:/tmp . _
 
 Have fun!
 
 -- 
 Fancy algorithms are slow when n is small, and n is usually small.
 Fancy algorithms have big constants. Until you know that n is
 frequently going to be big, don't get fancy.
 -- Rob Pike
 ___
 Vserver mailing list
 [EMAIL PROTECTED]
 http://list.linux-vserver.org/mailman/listinfo/vserver
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver


Re: [Vserver] cpu limits clone vservers

2004-11-23 Thread Sam Vilain
Jörn Engel wrote:
...and the big challenge is - how do you apply this to memory usage?
Oh, you could.  But the general concept behind my unquoted list is a
renewing resource.  Network throughput is renewing.  Network bandwidth
usually isn't.  With swapping, you can turn memory into cache and
locality to the cpu is a renewable resource.
Yep, that was my thought too.  Memory seems like a static resource, so
consider RSS used per second the renewable resource.  Then you could
charge tokens as normal.
However, there are some tricky questions:
  1) who do you charge shared memory (binaries etc) to ?
  2) do you count mmap()'d regions in the buffercache?
  3) if a process is sitting idle, but there is no VM contention, then
 they are using that memory more, so maybe they are using more
 fast memory tokens - but they might not really be occupying it,
 because it is not active.
Maybe the thing with memory is that it's not important about how much is
used per second, but more about how much active memory you are
*displacing* per second into other places.
We can find out from the VM subsystem how much RAM is displaced into
swap by a context / process.  It might also be possible for the MMU to
report how much L2/L3 cache is displaced during a given slice.  I have a
hunch that the best solution to the memory usage problem will have to
take into account the multi-tiered nature of memory.  So, I think it
would be excellent to be able to penalise contexts that thrash the L3
cache.  Systems with megabytes of L3 cache were designed to keep the
most essential parts of most of the run queue hot - programs that thwart
this by being bulky and excessively using pointers waste that cache.
And then, it needs to all be done with no more than a few hundred cycles
every reschedule.  Hmm.
Here's a thought about an algorithm that might work.  This is all 
speculation without much regard to the existing implementations out
there, of course.  Season with grains of salt to taste.

Each context is assigned a target RSS and VM size.  Usage is counted a
la disklimits (Herbert - is this already done?), but all complex
recalculation happens when somethings tries to swap something else out.
As well as memory totals, each context also has a score that tracks how
good or bad they've been with memory.  Let's call that the Jabba
value.
When swap displacement occurs, it is first taken from disproportionately
fat jabbas that are running on nearby CPUs (for NUMA).  Displacing
other's memory makes your context a fatter jabba too, but taking from
jabbas that are already fat is not as bad as taking it from a hungry
jabba.  When someone takes your memory, that makes you a thinner jabba.
This is not the same as simply a ratio of your context's memory usage to
the allocated amount.  Depending on the functions used to alter the
jabba value, it should hopefully end up measuring something more akin to
the amount of system memory turnover a context is inducing.  It might
also need something to act as a damper to pull a context's jabba nearer
towards the zero point during lulls of VM activity.
Then, if you are a fat jabba, maybe you might end up getting rescheduled
instead of getting more memory whenever you want it!
--
Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13
(include my PGP key ID in personal replies to avoid spam filtering)
___
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver