Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On Monday, January 12, 2015 05:29:25 PM George Dunlap wrote: > When I said "10s seems very conservative", I meant, "10s should be by > far long enough for something to happen". If you can't free up at least > 1k in 30s, then there is certainly something very unusual with your > system. So I was recommending leaving that the default. Thanks for the clarification. I definitely agree that some memory should be freed in that timeframe - or you have bigger issues to deal with. > I'm not really sure under what conditions an admin might want to fiddle > with the timeout, however, so I'm having second thoughts about adding > new options. Ok. Given Ian's comment about ballooning down being serialized, should I send an official patch for further review? Thanks, Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On 01/12/2015 05:04 PM, Mike Latimer wrote: >> I'm inclined to say we could add an option to say "wait forever", or >> to increase the period of the checks; but ultimately at some point >> someone (either xl or the human) needs to timeout and say, "This is >> never going to finish". 10s seems like a very conservative default. > > Agreed. Is a better solution to increase the timeout to some larger number > and > add an option to "wait forever"? When I said "10s seems very conservative", I meant, "10s should be by far long enough for something to happen". If you can't free up at least 1k in 30s, then there is certainly something very unusual with your system. So I was recommending leaving that the default. I'm not really sure under what conditions an admin might want to fiddle with the timeout, however, so I'm having second thoughts about adding new options. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On Mon, 2015-01-12 at 10:04 -0700, Mike Latimer wrote: > On Monday, January 12, 2015 12:36:01 PM George Dunlap wrote: > > I would: > > 1. Reset the retries after a successful increase > > 2. Not allow free_memkb_prev to go down. > > Thanks, George. Good points, which definitely improve the situation. > > > So maybe something like the following? > > > > if (free_memkb <= free_memkb_prev) { > >retries--; > > } else { > >retries = MAX_RETRIES; > >free_memkb_prev = free_memkb; > > } > > Even with a lot of changes in free memory, it seems like the situation should > be better with this change. If free memory does not increase, we will still > encounter the 33 second timeout. On the other hand, as long as memory is > being > freed, we continue to wait. > > One case to consider is multiple VMs waiting for memory at the same time. For > example, a 16GB guest attempting to start immediately after a 512GB guest has > triggered the ballooning process. If the 16GB guest starts as soon as 16GB of > memory is free, there is a window of time that free memory would decrease > before the 512GB is freed. With two very large guests, there could be a > situation where the 33 second timeout would be exceeded before free_memkb is > finally greater than free_memkb_prev again. > > I'm not sure whether the above theoretical example would ever be seen in > practice, and even if it could be seen, I think the situation is improved > with > the simple solution above. There is a lock at the xl level so the ballooning down should be serialised. > > I'm inclined to say we could add an option to say "wait forever", or > > to increase the period of the checks; but ultimately at some point > > someone (either xl or the human) needs to timeout and say, "This is > > never going to finish". 10s seems like a very conservative default. > > Agreed. Is a better solution to increase the timeout to some larger number > and > add an option to "wait forever"? > > Thanks, > Mike > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On Monday, January 12, 2015 12:36:01 PM George Dunlap wrote: > I would: > 1. Reset the retries after a successful increase > 2. Not allow free_memkb_prev to go down. Thanks, George. Good points, which definitely improve the situation. > So maybe something like the following? > > if (free_memkb <= free_memkb_prev) { >retries--; > } else { >retries = MAX_RETRIES; >free_memkb_prev = free_memkb; > } Even with a lot of changes in free memory, it seems like the situation should be better with this change. If free memory does not increase, we will still encounter the 33 second timeout. On the other hand, as long as memory is being freed, we continue to wait. One case to consider is multiple VMs waiting for memory at the same time. For example, a 16GB guest attempting to start immediately after a 512GB guest has triggered the ballooning process. If the 16GB guest starts as soon as 16GB of memory is free, there is a window of time that free memory would decrease before the 512GB is freed. With two very large guests, there could be a situation where the 33 second timeout would be exceeded before free_memkb is finally greater than free_memkb_prev again. I'm not sure whether the above theoretical example would ever be seen in practice, and even if it could be seen, I think the situation is improved with the simple solution above. > I'm inclined to say we could add an option to say "wait forever", or > to increase the period of the checks; but ultimately at some point > someone (either xl or the human) needs to timeout and say, "This is > never going to finish". 10s seems like a very conservative default. Agreed. Is a better solution to increase the timeout to some larger number and add an option to "wait forever"? Thanks, Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On Thu, Jan 8, 2015 at 1:11 AM, Mike Latimer wrote: > On Wednesday, January 07, 2015 09:38:31 AM Ian Campbell wrote: >> That's exactly what I was about to suggest as I read the penultimate >> paragraph, i.e. keep waiting so long as some reasonable delta occurs on >> each iteration. > > Thanks, Ian. > > I wonder if there is a future-safe threshold on the amount of delta that > indicates progress is being made. Should some minimum safe progress amount or > percentage be set, or is it better to just make sure free memory is increasing > at the end of each iteration of the loop? > > For example, the following simple change just tracks free_memkb and only > decrements the retry count if it has not increased since the last check: > > -- > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index ed0d478..4cf2991 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -2196,7 +2196,7 @@ static int preserve_domain(uint32_t *r_domid, > libxl_event *event, > static int freemem(uint32_t domid, libxl_domain_build_info *b_info) > { > int rc, retries = 3; > -uint32_t need_memkb, free_memkb; > +uint32_t need_memkb, free_memkb, free_memkb_prev = 0; > > if (!autoballoon) > return 0; > @@ -2229,7 +2229,10 @@ static int freemem(uint32_t domid, > libxl_domain_build_info *b_info) > if (rc < 0) > return rc; > > -retries--; > +/* only decrement retry count if free_memkb is not increasing */ > +if (free_memkb <= free_memkb_prev) > +retries--; > +free_memkb_prev = free_memkb; I would: 1. Reset the retries after a successful increase 2. Not allow free_memkb_prev to go down. So maybe something like the following? if (free_memkb <= free_memkb_prev) { retries--; } else { retries = MAX_RETRIES; free_memkb_prev = free_memkb; } I'm inclined to say we could add an option to say "wait forever", or to increase the period of the checks; but ultimately at some point someone (either xl or the human) needs to timeout and say, "This is never going to finish". 10s seems like a very conservative default. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
CCing Stefano, who was involved in the original xl ballooning stuff and the other toolstack maintainers. On Wed, 2015-01-07 at 18:11 -0700, Mike Latimer wrote: > On Wednesday, January 07, 2015 09:38:31 AM Ian Campbell wrote: > > That's exactly what I was about to suggest as I read the penultimate > > paragraph, i.e. keep waiting so long as some reasonable delta occurs on > > each iteration. > > Thanks, Ian. > > I wonder if there is a future-safe threshold on the amount of delta that > indicates progress is being made. Should some minimum safe progress amount or > percentage be set, or is it better to just make sure free memory is > increasing > at the end of each iteration of the loop? I'm not sure. It seems like the balloon ought to be able to make *some* progress over the course of 10s, even on a heavily loaded system. The reason for my uncertainty is that there is a certain amount of noise in the amount of current free memory in a system, since backend driver operation (at least with some kernels) causes it to fluctuate a bit as things are grant mapped unmapped. So using free_memkb as you have might take a bit of care. I'm more inclined to suggest you'd be better off checking that dom0's actual allocation is shrinking, but that might be subject to the same noise (I'm not sure if/how grants are accounted to a guest...). Which sort of leads me towards thinking that the loop should tolerate slight *increases* in dom0's allocation, but that simply can't be right! > > For example, the following simple change just tracks free_memkb and only > decrements the retry count if it has not increased since the last check: > > -- > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index ed0d478..4cf2991 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -2196,7 +2196,7 @@ static int preserve_domain(uint32_t *r_domid, > libxl_event *event, > static int freemem(uint32_t domid, libxl_domain_build_info *b_info) > { > int rc, retries = 3; > -uint32_t need_memkb, free_memkb; > +uint32_t need_memkb, free_memkb, free_memkb_prev = 0; > > if (!autoballoon) > return 0; > @@ -2229,7 +2229,10 @@ static int freemem(uint32_t domid, > libxl_domain_build_info *b_info) > if (rc < 0) > return rc; > > -retries--; > +/* only decrement retry count if free_memkb is not increasing */ > +if (free_memkb <= free_memkb_prev) > +retries--; > +free_memkb_prev = free_memkb; > } while (retries > 0); > > return ERROR_NOMEM; > -- > > I'm not sure if the above approach is always safe, but it works in my > testing. > I'd appreciate any other thoughts you might have before I try submitting an > official patch for this... > > Thanks, > Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On Wednesday, January 07, 2015 09:38:31 AM Ian Campbell wrote: > That's exactly what I was about to suggest as I read the penultimate > paragraph, i.e. keep waiting so long as some reasonable delta occurs on > each iteration. Thanks, Ian. I wonder if there is a future-safe threshold on the amount of delta that indicates progress is being made. Should some minimum safe progress amount or percentage be set, or is it better to just make sure free memory is increasing at the end of each iteration of the loop? For example, the following simple change just tracks free_memkb and only decrements the retry count if it has not increased since the last check: -- diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index ed0d478..4cf2991 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -2196,7 +2196,7 @@ static int preserve_domain(uint32_t *r_domid, libxl_event *event, static int freemem(uint32_t domid, libxl_domain_build_info *b_info) { int rc, retries = 3; -uint32_t need_memkb, free_memkb; +uint32_t need_memkb, free_memkb, free_memkb_prev = 0; if (!autoballoon) return 0; @@ -2229,7 +2229,10 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info) if (rc < 0) return rc; -retries--; +/* only decrement retry count if free_memkb is not increasing */ +if (free_memkb <= free_memkb_prev) +retries--; +free_memkb_prev = free_memkb; } while (retries > 0); return ERROR_NOMEM; -- I'm not sure if the above approach is always safe, but it works in my testing. I'd appreciate any other thoughts you might have before I try submitting an official patch for this... Thanks, Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On Tue, 2015-01-06 at 14:17 -0700, Mike Latimer wrote: > Hi, > > In a previous post (1), I mentioned issues seen while ballooning a large > amount of memory. In the current code, the ballooning process only has 33 > seconds to complete, or the xl operation (i.e. domain create) will fail. When > a lot of ballooning is required, or the host is very slow to balloon memory, > this delay is not sufficient. > > The code involved is tools/libxl/xl_cmdimpl.c:freemem. This function retries > 3 > times, and each retry includes a 10 second delay in > libxl_wait_for_free_memory > and a 1 second delay in libxl_wait_for_memory_target. > > Is there a better approach, which would account for ballooning operations > that > take a much longer time to complete? > > The easiest option is to simply increase the retry count, but that would > again > leave us with a fixed window of time for an operation to complete. It seems > like something that monitors the balloon process, and continues to wait if it > is progressing, might be a better approach. That's exactly what I was about to suggest as I read the penultimate paragraph, i.e. keep waiting so long as some reasonable delta occurs on each iteration. Ian. > > Any ideas? > > Thanks, > Mike > > 1. http://lists.xen.org/archives/html/xen-devel/2014-12/msg01443.html > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] xl only waits 33 seconds for ballooning to complete
Hi, In a previous post (1), I mentioned issues seen while ballooning a large amount of memory. In the current code, the ballooning process only has 33 seconds to complete, or the xl operation (i.e. domain create) will fail. When a lot of ballooning is required, or the host is very slow to balloon memory, this delay is not sufficient. The code involved is tools/libxl/xl_cmdimpl.c:freemem. This function retries 3 times, and each retry includes a 10 second delay in libxl_wait_for_free_memory and a 1 second delay in libxl_wait_for_memory_target. Is there a better approach, which would account for ballooning operations that take a much longer time to complete? The easiest option is to simply increase the retry count, but that would again leave us with a fixed window of time for an operation to complete. It seems like something that monitors the balloon process, and continues to wait if it is progressing, might be a better approach. Any ideas? Thanks, Mike 1. http://lists.xen.org/archives/html/xen-devel/2014-12/msg01443.html ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel