Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-12 Thread Mike Latimer
On Monday, January 12, 2015 05:29:25 PM George Dunlap wrote:
> When I said "10s seems very conservative", I meant, "10s should be by
> far long enough for something to happen".  If you can't free up at least
> 1k in 30s, then there is certainly something very unusual with your
> system.  So I was recommending leaving that the default.

Thanks for the clarification. I definitely agree that some memory should be 
freed in that timeframe - or you have bigger issues to deal with.

> I'm not really sure under what conditions an admin might want to fiddle
> with the timeout, however, so I'm having second thoughts about adding
> new options.

Ok. Given Ian's comment about ballooning down being serialized, should I send 
an official patch for further review?

Thanks,
Mike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-12 Thread George Dunlap
On 01/12/2015 05:04 PM, Mike Latimer wrote:
>> I'm inclined to say we could add an option to say "wait forever", or
>> to increase the period of the checks; but ultimately at some point
>> someone (either xl or the human) needs to timeout and say, "This is
>> never going to finish".  10s seems like a very conservative default.
> 
> Agreed. Is a better solution to increase the timeout to some larger number 
> and 
> add an option to "wait forever"?

When I said "10s seems very conservative", I meant, "10s should be by
far long enough for something to happen".  If you can't free up at least
1k in 30s, then there is certainly something very unusual with your
system.  So I was recommending leaving that the default.

I'm not really sure under what conditions an admin might want to fiddle
with the timeout, however, so I'm having second thoughts about adding
new options.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-12 Thread Ian Campbell
On Mon, 2015-01-12 at 10:04 -0700, Mike Latimer wrote:
> On Monday, January 12, 2015 12:36:01 PM George Dunlap wrote:
> > I would:
> > 1. Reset the retries after a successful increase
> > 2. Not allow free_memkb_prev to go down.
> 
> Thanks, George. Good points, which definitely improve the situation.
> 
> > So maybe something like the following?
> > 
> >  if (free_memkb <= free_memkb_prev) {
> >retries--;
> >  } else {
> >retries = MAX_RETRIES;
> >free_memkb_prev = free_memkb;
> >  }
> 
> Even with a lot of changes in free memory, it seems like the situation should 
> be better with this change. If free memory does not increase, we will still 
> encounter the 33 second timeout. On the other hand, as long as memory is 
> being 
> freed, we continue to wait.
> 
> One case to consider is multiple VMs waiting for memory at the same time. For 
> example, a 16GB guest attempting to start immediately after a 512GB guest has 
> triggered the ballooning process. If the 16GB guest starts as soon as 16GB of 
> memory is free, there is a window of time that free memory would decrease 
> before the 512GB is freed. With two very large guests, there could be a 
> situation where the 33 second timeout would be exceeded before free_memkb is 
> finally greater than free_memkb_prev again.
> 
> I'm not sure whether the above theoretical example would ever be seen in 
> practice, and even if it could be seen, I think the situation is improved 
> with 
> the simple solution above.

There is a lock at the xl level so the ballooning down should be
serialised.

> > I'm inclined to say we could add an option to say "wait forever", or
> > to increase the period of the checks; but ultimately at some point
> > someone (either xl or the human) needs to timeout and say, "This is
> > never going to finish".  10s seems like a very conservative default.
> 
> Agreed. Is a better solution to increase the timeout to some larger number 
> and 
> add an option to "wait forever"?
> 
> Thanks,
> Mike
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-12 Thread Mike Latimer
On Monday, January 12, 2015 12:36:01 PM George Dunlap wrote:
> I would:
> 1. Reset the retries after a successful increase
> 2. Not allow free_memkb_prev to go down.

Thanks, George. Good points, which definitely improve the situation.

> So maybe something like the following?
> 
>  if (free_memkb <= free_memkb_prev) {
>retries--;
>  } else {
>retries = MAX_RETRIES;
>free_memkb_prev = free_memkb;
>  }

Even with a lot of changes in free memory, it seems like the situation should 
be better with this change. If free memory does not increase, we will still 
encounter the 33 second timeout. On the other hand, as long as memory is being 
freed, we continue to wait.

One case to consider is multiple VMs waiting for memory at the same time. For 
example, a 16GB guest attempting to start immediately after a 512GB guest has 
triggered the ballooning process. If the 16GB guest starts as soon as 16GB of 
memory is free, there is a window of time that free memory would decrease 
before the 512GB is freed. With two very large guests, there could be a 
situation where the 33 second timeout would be exceeded before free_memkb is 
finally greater than free_memkb_prev again.

I'm not sure whether the above theoretical example would ever be seen in 
practice, and even if it could be seen, I think the situation is improved with 
the simple solution above.

> I'm inclined to say we could add an option to say "wait forever", or
> to increase the period of the checks; but ultimately at some point
> someone (either xl or the human) needs to timeout and say, "This is
> never going to finish".  10s seems like a very conservative default.

Agreed. Is a better solution to increase the timeout to some larger number and 
add an option to "wait forever"?

Thanks,
Mike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-12 Thread George Dunlap
On Thu, Jan 8, 2015 at 1:11 AM, Mike Latimer  wrote:
> On Wednesday, January 07, 2015 09:38:31 AM Ian Campbell wrote:
>> That's exactly what I was about to suggest as I read the penultimate
>> paragraph, i.e. keep waiting so long as some reasonable delta occurs on
>> each iteration.
>
> Thanks, Ian.
>
> I wonder if there is a future-safe threshold on the amount of delta that
> indicates progress is being made. Should some minimum safe progress amount or
> percentage be set, or is it better to just make sure free memory is increasing
> at the end of each iteration of the loop?
>
> For example, the following simple change just tracks free_memkb and only
> decrements the retry count if it has not increased since the last check:
>
> --
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index ed0d478..4cf2991 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2196,7 +2196,7 @@ static int preserve_domain(uint32_t *r_domid,
> libxl_event *event,
>  static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
>  {
>  int rc, retries = 3;
> -uint32_t need_memkb, free_memkb;
> +uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
>
>  if (!autoballoon)
>  return 0;
> @@ -2229,7 +2229,10 @@ static int freemem(uint32_t domid,
> libxl_domain_build_info *b_info)
>  if (rc < 0)
>  return rc;
>
> -retries--;
> +/* only decrement retry count if free_memkb is not increasing */
> +if (free_memkb <= free_memkb_prev)
> +retries--;
> +free_memkb_prev = free_memkb;

I would:
1. Reset the retries after a successful increase
2. Not allow free_memkb_prev to go down.

So maybe something like the following?

 if (free_memkb <= free_memkb_prev) {
   retries--;
 } else {
   retries = MAX_RETRIES;
   free_memkb_prev = free_memkb;
 }

I'm inclined to say we could add an option to say "wait forever", or
to increase the period of the checks; but ultimately at some point
someone (either xl or the human) needs to timeout and say, "This is
never going to finish".  10s seems like a very conservative default.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-08 Thread Ian Campbell
CCing Stefano, who was involved in the original xl ballooning stuff and
the other toolstack maintainers.

On Wed, 2015-01-07 at 18:11 -0700, Mike Latimer wrote:
> On Wednesday, January 07, 2015 09:38:31 AM Ian Campbell wrote:
> > That's exactly what I was about to suggest as I read the penultimate
> > paragraph, i.e. keep waiting so long as some reasonable delta occurs on
> > each iteration.
> 
> Thanks, Ian.
> 
> I wonder if there is a future-safe threshold on the amount of delta that 
> indicates progress is being made. Should some minimum safe progress amount or 
> percentage be set, or is it better to just make sure free memory is 
> increasing 
> at the end of each iteration of the loop?

I'm not sure. It seems like the balloon ought to be able to make *some*
progress over the course of 10s, even on a heavily loaded system.

The reason for my uncertainty is that there is a certain amount of noise
in the amount of current free memory in a system, since backend driver
operation (at least with some kernels) causes it to fluctuate a bit as
things are grant mapped unmapped. 

So using free_memkb as you have might take a bit of care.

I'm more inclined to suggest you'd be better off checking that dom0's
actual allocation is shrinking, but that might be subject to the same
noise (I'm not sure if/how grants are accounted to a guest...).

Which sort of leads me towards thinking that the loop should tolerate
slight *increases* in dom0's allocation, but that simply can't be right!

> 
> For example, the following simple change just tracks free_memkb and only 
> decrements the retry count if it has not increased since the last check:
> 
> --
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index ed0d478..4cf2991 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2196,7 +2196,7 @@ static int preserve_domain(uint32_t *r_domid, 
> libxl_event *event,
>  static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
>  {
>  int rc, retries = 3;
> -uint32_t need_memkb, free_memkb;
> +uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
> 
>  if (!autoballoon)
>  return 0;
> @@ -2229,7 +2229,10 @@ static int freemem(uint32_t domid, 
> libxl_domain_build_info *b_info)
>  if (rc < 0)
>  return rc;
> 
> -retries--;
> +/* only decrement retry count if free_memkb is not increasing */
> +if (free_memkb <= free_memkb_prev)
> +retries--;
> +free_memkb_prev = free_memkb;
>  } while (retries > 0);
> 
>  return ERROR_NOMEM;
> --
> 
> I'm not sure if the above approach is always safe, but it works in my 
> testing. 
> I'd appreciate any other thoughts you might have before I try submitting an 
> official patch for this...
> 
> Thanks,
> Mike



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-07 Thread Mike Latimer
On Wednesday, January 07, 2015 09:38:31 AM Ian Campbell wrote:
> That's exactly what I was about to suggest as I read the penultimate
> paragraph, i.e. keep waiting so long as some reasonable delta occurs on
> each iteration.

Thanks, Ian.

I wonder if there is a future-safe threshold on the amount of delta that 
indicates progress is being made. Should some minimum safe progress amount or 
percentage be set, or is it better to just make sure free memory is increasing 
at the end of each iteration of the loop?

For example, the following simple change just tracks free_memkb and only 
decrements the retry count if it has not increased since the last check:

--
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ed0d478..4cf2991 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2196,7 +2196,7 @@ static int preserve_domain(uint32_t *r_domid, 
libxl_event *event,
 static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
 {
 int rc, retries = 3;
-uint32_t need_memkb, free_memkb;
+uint32_t need_memkb, free_memkb, free_memkb_prev = 0;

 if (!autoballoon)
 return 0;
@@ -2229,7 +2229,10 @@ static int freemem(uint32_t domid, 
libxl_domain_build_info *b_info)
 if (rc < 0)
 return rc;

-retries--;
+/* only decrement retry count if free_memkb is not increasing */
+if (free_memkb <= free_memkb_prev)
+retries--;
+free_memkb_prev = free_memkb;
 } while (retries > 0);

 return ERROR_NOMEM;
--

I'm not sure if the above approach is always safe, but it works in my testing. 
I'd appreciate any other thoughts you might have before I try submitting an 
official patch for this...

Thanks,
Mike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-07 Thread Ian Campbell
On Tue, 2015-01-06 at 14:17 -0700, Mike Latimer wrote:
> Hi,
> 
> In a previous post (1), I mentioned issues seen while ballooning a large 
> amount of memory. In the current code, the ballooning process only has 33 
> seconds to complete, or the xl operation (i.e. domain create) will fail. When 
> a lot of ballooning is required, or the host is very slow to balloon memory, 
> this delay is not sufficient.
> 
> The code involved is tools/libxl/xl_cmdimpl.c:freemem. This function retries 
> 3 
> times, and each retry includes a 10 second delay in 
> libxl_wait_for_free_memory 
> and a 1 second delay in libxl_wait_for_memory_target.
> 
> Is there a better approach, which would account for ballooning operations 
> that 
> take a much longer time to complete?
> 
> The easiest option is to simply increase the retry count, but that would 
> again 
> leave us with a fixed window of time for an operation to complete. It seems 
> like something that monitors the balloon process, and continues to wait if it 
> is progressing, might be a better approach.

That's exactly what I was about to suggest as I read the penultimate
paragraph, i.e. keep waiting so long as some reasonable delta occurs on
each iteration.

Ian.

> 
> Any ideas?
> 
> Thanks,
> Mike
> 
> 1. http://lists.xen.org/archives/html/xen-devel/2014-12/msg01443.html
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] xl only waits 33 seconds for ballooning to complete

2015-01-06 Thread Mike Latimer
Hi,

In a previous post (1), I mentioned issues seen while ballooning a large 
amount of memory. In the current code, the ballooning process only has 33 
seconds to complete, or the xl operation (i.e. domain create) will fail. When 
a lot of ballooning is required, or the host is very slow to balloon memory, 
this delay is not sufficient.

The code involved is tools/libxl/xl_cmdimpl.c:freemem. This function retries 3 
times, and each retry includes a 10 second delay in libxl_wait_for_free_memory 
and a 1 second delay in libxl_wait_for_memory_target.

Is there a better approach, which would account for ballooning operations that 
take a much longer time to complete?

The easiest option is to simply increase the retry count, but that would again 
leave us with a fixed window of time for an operation to complete. It seems 
like something that monitors the balloon process, and continues to wait if it 
is progressing, might be a better approach.

Any ideas?

Thanks,
Mike

1. http://lists.xen.org/archives/html/xen-devel/2014-12/msg01443.html


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel