RE: migrate_set_downtime bug

2009-10-07 Thread Dietmar Maurer
> > What is the reasoning behind such short downtimes? Are there any > application that will fail with longer downtimes (let say 1s)? > > > > Note: on a 1Gbit/s net you can transfer only 10MB within 100ms > > which accounts for more than 2 thousand pages, which sounds like enough > for a first pas

Re: migrate_set_downtime bug

2009-10-07 Thread Glauber Costa
On Wed, Oct 07, 2009 at 06:42:48AM +0200, Dietmar Maurer wrote: > > > The default downtime is set to 30ms. This value triggers the > > convergence problem quite often. Maybe a longer default is more > > reasonable. > > What do you feel about 100 ms? > > What is the reasoning behind such short down

RE: migrate_set_downtime bug

2009-10-06 Thread Dietmar Maurer
> > The default downtime is set to 30ms. This value triggers the > convergence problem quite often. Maybe a longer default is more > reasonable. > What do you feel about 100 ms? What is the reasoning behind such short downtimes? Are there any application that will fail with longer downtimes (let

Re: migrate_set_downtime bug

2009-10-06 Thread Glauber Costa
On Tue, Oct 06, 2009 at 10:30:14AM +0200, Dietmar Maurer wrote: > > > 'bandwidth' is something that changes dynamically (or by user > > settings), so why don't we simply abort after some amount of > > transferred memory (constant * memory size). This can be implemented by > > the management applica

RE: migrate_set_downtime bug

2009-10-06 Thread Dietmar Maurer
> > 'bandwidth' is something that changes dynamically (or by user > settings), so why don't we simply abort after some amount of > transferred memory (constant * memory size). This can be implemented by > the management application without problems, although it's much easier > inside kvm. > > > Eas

Re: migrate_set_downtime bug

2009-10-05 Thread Glauber Costa
On Mon, Oct 05, 2009 at 04:09:43PM +0200, Dietmar Maurer wrote: > > Heuristics like number of pages, maybe. But since we don't export > > iteration information, we can't expect management tools to stop the > > guest if migration doesn't converge. > > > > I suppose it could issue a 'stop' after so

RE: migrate_set_downtime bug

2009-10-05 Thread Dietmar Maurer
> We used to have a heuristic that said 'if an iteration transfers more > pages than the previous iteration, we've stopped converging'. Why > wouldn't that work? I agree that this is the 'right' approach - but it is just too difficult to detect that we are not 'converging', and it does not set a

Re: migrate_set_downtime bug

2009-10-05 Thread Avi Kivity
On 10/05/2009 04:08 PM, Dietmar Maurer wrote: Well, if each iteration transfers one page less than the previous one, it doesn't. So how long does a migration take in this scenario when you have a VM with 8GB RAM? At 1 Gbps, about 2 years. -- error compiling committee.c: too many a

RE: migrate_set_downtime bug

2009-10-05 Thread Dietmar Maurer
> -Original Message- > From: Avi Kivity [mailto:a...@redhat.com] > Sent: Montag, 05. Oktober 2009 16:06 > To: Dietmar Maurer > Cc: Glauber Costa; Anthony Liguori; kvm > Subject: Re: migrate_set_downtime bug > > On 10/05/2009 04:01 PM, Dietmar Maurer wrote

RE: migrate_set_downtime bug

2009-10-05 Thread Dietmar Maurer
> Heuristics like number of pages, maybe. But since we don't export > iteration information, we can't expect management tools to stop the > guest if migration doesn't converge. > > I suppose it could issue a 'stop' after some amount of time (constant * > memory size / bandwidth). 'bandwidth' is

RE: migrate_set_downtime bug

2009-10-05 Thread Dietmar Maurer
> On 10/05/2009 04:01 PM, Dietmar Maurer wrote: > >> We used to have a heuristic that said 'if an iteration transfers > more > >> pages than the previous iteration, we've stopped converging'. Why > >> wouldn't that work? > >> > > This does not protect you from very long migration times. > > > > >

Re: migrate_set_downtime bug

2009-10-05 Thread Avi Kivity
On 10/05/2009 04:01 PM, Dietmar Maurer wrote: We used to have a heuristic that said 'if an iteration transfers more pages than the previous iteration, we've stopped converging'. Why wouldn't that work? This does not protect you from very long migration times. Well, if each iteratio

RE: migrate_set_downtime bug

2009-10-05 Thread Dietmar Maurer
> We used to have a heuristic that said 'if an iteration transfers more > pages than the previous iteration, we've stopped converging'. Why > wouldn't that work? This does not protect you from very long migration times. - Dietmar -- To unsubscribe from this list: send the line "unsubscribe kvm"

Re: migrate_set_downtime bug

2009-10-05 Thread Avi Kivity
On 10/05/2009 03:04 PM, Glauber Costa wrote: On Mon, Oct 05, 2009 at 02:17:30PM +0200, Avi Kivity wrote: On 09/30/2009 08:41 PM, Dietmar Maurer wrote: I just think of common scenarios like 'maintanace mode', where all VM should migrate to another host. A endless migrate task can make

Re: migrate_set_downtime bug

2009-10-05 Thread Glauber Costa
On Mon, Oct 05, 2009 at 02:17:30PM +0200, Avi Kivity wrote: > On 09/30/2009 08:41 PM, Dietmar Maurer wrote: >> >> I just think of common scenarios like 'maintanace mode', where all VM should >> migrate to another host. A endless migrate task can make that fail. >> >> For me, it is totally unclear

Re: migrate_set_downtime bug

2009-10-05 Thread Avi Kivity
On 09/30/2009 08:41 PM, Dietmar Maurer wrote: I just think of common scenarios like 'maintanace mode', where all VM should migrate to another host. A endless migrate task can make that fail. For me, it is totally unclear what value I should set for 'max_downtime' to avoid that behavior?

RE: migrate_set_downtime bug

2009-09-30 Thread Dietmar Maurer
> > > > +if ((stage == 2) && (bytes_transferred > > 2*ram_bytes_total())) { > > > > +return 1; > > > > +} > > > why 2 * ? > > > This means we'll have to transfer the whole contents of RAM at > least > > > twice to hit this condition, right? > > > > Yes, this is just an arbitrary lim

Re: migrate_set_downtime bug

2009-09-30 Thread Glauber Costa
On Wed, Sep 30, 2009 at 04:11:32PM +0200, Dietmar Maurer wrote: > > On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote: > > > Another problem occur when max_downtime is too short. This can > > results in never ending migration task. > > > > > > To reproduce just play a video inside a VM

RE: migrate_set_downtime bug

2009-09-30 Thread Dietmar Maurer
> On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote: > > Another problem occur when max_downtime is too short. This can > results in never ending migration task. > > > > To reproduce just play a video inside a VM and set max_downtime to > 30ns > > > > Sure, one can argument that this b

Re: migrate_set_downtime bug

2009-09-30 Thread Glauber Costa
twice to hit this condition, right? > > Or do you think that is not reasonable? > > - Dietmar > > > -Original Message- > > From: Glauber Costa [mailto:glom...@redhat.com] > > Sent: Mittwoch, 30. September 2009 06:49 > > To: Dietmar Maurer > > Cc

RE: migrate_set_downtime bug

2009-09-30 Thread Dietmar Maurer
; Cc: Anthony Liguori; kvm > Subject: Re: migrate_set_downtime bug > > On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote: > > > Also, if this is really the case (buffered), then the bandwidth > capping > > > part > > > of migration is also wrong. &g

RE: migrate_set_downtime bug

2009-09-29 Thread Dietmar Maurer
> Since the problem you pinpointed do exist, I would suggest measuring > the average load of the last, > say, 10 iterations. The "last 10 interation" does not define a fixed time. I guess it is much more reasonable to measure the average of the last '10 seconds'. But usually a migration only tak

Re: migrate_set_downtime bug

2009-09-29 Thread Glauber Costa
On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote: > > Also, if this is really the case (buffered), then the bandwidth capping > > part > > of migration is also wrong. > > > > Have you compared the reported bandwidth to your actual bandwith ? I > > suspect > > the source of the proble

RE: migrate_set_downtime bug

2009-09-29 Thread Dietmar Maurer
> Also, if this is really the case (buffered), then the bandwidth capping > part > of migration is also wrong. > > Have you compared the reported bandwidth to your actual bandwith ? I > suspect > the source of the problem can be that we're currently ignoring the time > we take > to transfer the st

Re: migrate_set_downtime bug

2009-09-29 Thread Glauber Costa
t; >> >>> -Original Message- >>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On >>> Behalf Of Dietmar Maurer >>> Sent: Dienstag, 29. September 2009 16:37 >>> To: kvm >>> Subject: RE: migrate_set_downtime bug &g

Re: migrate_set_downtime bug

2009-09-29 Thread Anthony Liguori
: Dienstag, 29. September 2009 16:37 To: kvm Subject: RE: migrate_set_downtime bug Seems the bwidth calculation is the problem. The code simply does: bwidth = (bytes_transferred - bytes_transferred_last) / timediff but I assume network traffic is buffered, so calculated bwidth is sometimes much too

RE: migrate_set_downtime bug

2009-09-29 Thread Dietmar Maurer
this patch solves the problem by calculation an average bandwidth. - Dietmar > -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Dietmar Maurer > Sent: Dienstag, 29. September 2009 16:37 > To: kvm > Subject: RE: mi

RE: migrate_set_downtime bug

2009-09-29 Thread Dietmar Maurer
Seems the bwidth calculation is the problem. The code simply does: bwidth = (bytes_transferred - bytes_transferred_last) / timediff but I assume network traffic is buffered, so calculated bwidth is sometimes much too high. - Dietmar > -Original Message- > From: kvm-ow...@vger.kernel.o