On Wed, Oct 07, 2009 at 06:42:48AM +0200, Dietmar Maurer wrote:
The default downtime is set to 30ms. This value triggers the
convergence problem quite often. Maybe a longer default is more
reasonable.
What do you feel about 100 ms?
What is the reasoning behind such short downtimes? Are
What is the reasoning behind such short downtimes? Are there any
application that will fail with longer downtimes (let say 1s)?
Note: on a 1Gbit/s net you can transfer only 10MB within 100ms
which accounts for more than 2 thousand pages, which sounds like enough
for a first pass to me.
'bandwidth' is something that changes dynamically (or by user
settings), so why don't we simply abort after some amount of
transferred memory (constant * memory size). This can be implemented by
the management application without problems, although it's much easier
inside kvm.
Easier,
On Tue, Oct 06, 2009 at 10:30:14AM +0200, Dietmar Maurer wrote:
'bandwidth' is something that changes dynamically (or by user
settings), so why don't we simply abort after some amount of
transferred memory (constant * memory size). This can be implemented by
the management application
The default downtime is set to 30ms. This value triggers the
convergence problem quite often. Maybe a longer default is more
reasonable.
What do you feel about 100 ms?
What is the reasoning behind such short downtimes? Are there any application
that will fail with longer downtimes (let say
On 09/30/2009 08:41 PM, Dietmar Maurer wrote:
I just think of common scenarios like 'maintanace mode', where all VM should
migrate to another host. A endless migrate task can make that fail.
For me, it is totally unclear what value I should set for 'max_downtime' to
avoid that behavior?
On Mon, Oct 05, 2009 at 02:17:30PM +0200, Avi Kivity wrote:
On 09/30/2009 08:41 PM, Dietmar Maurer wrote:
I just think of common scenarios like 'maintanace mode', where all VM should
migrate to another host. A endless migrate task can make that fail.
For me, it is totally unclear what value
On 10/05/2009 03:04 PM, Glauber Costa wrote:
On Mon, Oct 05, 2009 at 02:17:30PM +0200, Avi Kivity wrote:
On 09/30/2009 08:41 PM, Dietmar Maurer wrote:
I just think of common scenarios like 'maintanace mode', where all VM should
migrate to another host. A endless migrate task can
We used to have a heuristic that said 'if an iteration transfers more
pages than the previous iteration, we've stopped converging'. Why
wouldn't that work?
This does not protect you from very long migration times.
- Dietmar
--
To unsubscribe from this list: send the line unsubscribe kvm in
On 10/05/2009 04:01 PM, Dietmar Maurer wrote:
We used to have a heuristic that said 'if an iteration transfers more
pages than the previous iteration, we've stopped converging'. Why
wouldn't that work?
This does not protect you from very long migration times.
Well, if each
Heuristics like number of pages, maybe. But since we don't export
iteration information, we can't expect management tools to stop the
guest if migration doesn't converge.
I suppose it could issue a 'stop' after some amount of time (constant *
memory size / bandwidth).
'bandwidth' is
-Original Message-
From: Avi Kivity [mailto:a...@redhat.com]
Sent: Montag, 05. Oktober 2009 16:06
To: Dietmar Maurer
Cc: Glauber Costa; Anthony Liguori; kvm
Subject: Re: migrate_set_downtime bug
On 10/05/2009 04:01 PM, Dietmar Maurer wrote:
We used to have a heuristic
On 10/05/2009 04:01 PM, Dietmar Maurer wrote:
We used to have a heuristic that said 'if an iteration transfers
more
pages than the previous iteration, we've stopped converging'. Why
wouldn't that work?
This does not protect you from very long migration times.
Well, if each
On 10/05/2009 04:08 PM, Dietmar Maurer wrote:
Well, if each iteration transfers one page less than the previous one,
it doesn't.
So how long does a migration take in this scenario when you have a VM with 8GB
RAM?
At 1 Gbps, about 2 years.
--
error compiling committee.c: too many
We used to have a heuristic that said 'if an iteration transfers more
pages than the previous iteration, we've stopped converging'. Why
wouldn't that work?
I agree that this is the 'right' approach - but it is just too difficult to
detect that we are not 'converging', and it does not set a
On Mon, Oct 05, 2009 at 04:09:43PM +0200, Dietmar Maurer wrote:
Heuristics like number of pages, maybe. But since we don't export
iteration information, we can't expect management tools to stop the
guest if migration doesn't converge.
I suppose it could issue a 'stop' after some amount
Since the problem you pinpointed do exist, I would suggest measuring
the average load of the last,
say, 10 iterations.
The last 10 interation does not define a fixed time. I guess it is much more
reasonable to measure the average of the last '10 seconds'.
But usually a migration only takes
: Re: migrate_set_downtime bug
On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote:
Also, if this is really the case (buffered), then the bandwidth
capping
part
of migration is also wrong.
Have you compared the reported bandwidth to your actual bandwith ?
I
suspect
that is not reasonable?
- Dietmar
-Original Message-
From: Glauber Costa [mailto:glom...@redhat.com]
Sent: Mittwoch, 30. September 2009 06:49
To: Dietmar Maurer
Cc: Anthony Liguori; kvm
Subject: Re: migrate_set_downtime bug
On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer
On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote:
Another problem occur when max_downtime is too short. This can
results in never ending migration task.
To reproduce just play a video inside a VM and set max_downtime to
30ns
Sure, one can argument that this behavior is
On Wed, Sep 30, 2009 at 04:11:32PM +0200, Dietmar Maurer wrote:
On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote:
Another problem occur when max_downtime is too short. This can
results in never ending migration task.
To reproduce just play a video inside a VM and set
+if ((stage == 2) (bytes_transferred
2*ram_bytes_total())) {
+return 1;
+}
why 2 * ?
This means we'll have to transfer the whole contents of RAM at
least
twice to hit this condition, right?
Yes, this is just an arbitrary limit.
I don't know. If we are
Seems the bwidth calculation is the problem. The code simply does:
bwidth = (bytes_transferred - bytes_transferred_last) / timediff
but I assume network traffic is buffered, so calculated bwidth is sometimes
much too high.
- Dietmar
-Original Message-
From:
this patch solves the problem by calculation an average bandwidth.
- Dietmar
-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
Behalf Of Dietmar Maurer
Sent: Dienstag, 29. September 2009 16:37
To: kvm
Subject: RE: migrate_set_downtime bug
: Dienstag, 29. September 2009 16:37
To: kvm
Subject: RE: migrate_set_downtime bug
Seems the bwidth calculation is the problem. The code simply does:
bwidth = (bytes_transferred - bytes_transferred_last) / timediff
but I assume network traffic is buffered, so calculated bwidth is
sometimes much too
[mailto:kvm-ow...@vger.kernel.org] On
Behalf Of Dietmar Maurer
Sent: Dienstag, 29. September 2009 16:37
To: kvm
Subject: RE: migrate_set_downtime bug
Seems the bwidth calculation is the problem. The code simply does:
bwidth = (bytes_transferred - bytes_transferred_last) / timediff
but I
Also, if this is really the case (buffered), then the bandwidth capping
part
of migration is also wrong.
Have you compared the reported bandwidth to your actual bandwith ? I
suspect
the source of the problem can be that we're currently ignoring the time
we take
to transfer the state of
On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote:
Also, if this is really the case (buffered), then the bandwidth capping
part
of migration is also wrong.
Have you compared the reported bandwidth to your actual bandwith ? I
suspect
the source of the problem can be that
28 matches
Mail list logo