On Thu, Apr 16, 2020 at 10:18:49AM +0200, David Hildenbrand wrote: > >> > >> Postcopy is a very good point, bought! > >> > >> But (what you wrote above) it sounds like that this is really what we > >> *have to* do, not an optimization. I‘ll double check the spec tomorrow > >> (hopefully it was documented). We should rephrase the comment then. > > > > Do you have a link to the spec that I could look at? I am not hopeful > > that this will be listed there since the poison side of QEMU was never > > implemented. The flag is only here because it was copied over in the > > kernel header. > > Introducing a feature without > > a) specification what it does > b) implementation (of both sides) showing what has to be done > c) any kind of documentation of what needs to be done > > just horrible. > > The latest-greatest spec lives in > > https://github.com/oasis-tcs/virtio-spec.git > > I can't spot any signs of free page hinting/page poisioning. :(
Right. I merged the hinting support in Linux on the hope that spec and qemu upstream will surface later. It seemed so since IIRC there were some drafts (even though I don't have any links to hand). Unfortunately neither happened. > We should document our result of page poisoning, free page hinting, and > free page reporting there as well. I hope you'll have time for the latter. > > ------------------------------------------------------------------------- > Semantics of VIRTIO_BALLOON_F_PAGE_POISON > ------------------------------------------------------------------------- > > "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the > guest is using page poisoning. Guest writes to the poison_val config > field to tell host about the page poisoning value that is in use." > -> Very little information, no signs about what has to be done. I think it's an informational field. Knowing that free pages are full of a specific pattern can be handy for the hypervisor for a variety of reasons. E.g. compression/deduplication? > "Let the hypervisor know that we are expecting a specific value to be > written back in balloon pages." > -> Okay, that talks about "balloon pages", which would include right now > -- pages "inflated" and then "deflated" using free page hinting > -- pages "inflated" and then "deflated" using oridnary inflate/deflate > queue ATM, in this case driver calls "free" and that fills page with the poison value. > -- pages "inflated" using free page reporting and automatically > "deflated" on access > > So VIRTIO_BALLOON_F_PAGE_POISON really means "whenever the guest > deflates a page (either explicitly, or implicitly with free page > reporting), it is filled with "poison_val". It might be a valid optimization to allow driver to skip poisoning of freed pages in this case. > And I would add > > "However, if the inflated page was not filled with "poison_val" when > inflating, it's not predictable if the original page or a page filled > with "poison_val" is returned." > > Which would cover the "we did not discard the page in the hypervisor, so > the original page is still there". > > > We should also document what is expected to happen if "poison_val" is > suddenly changed by the guest at one point in time again. (e.g., not > supported, unexpected things can happen, etc.) Right. I think we should require that this can only be changed before features have been negotiated. That is the only point where hypervisor can still fail gracefully (i.e. fail FEATURES_OK). > Also, we might have to > document that a device reset resets the poison_val to 0. (not sure if > that's really necessary) Probably yes, for things like kexec. > ------------------------------------------------------------------------- > What we have to do in the guest/Linux: > ------------------------------------------------------------------------- > > A guest which relies on this (esp., free page reporting in Linux only, > right?), has to not use/advertise VIRTIO_BALLOON_F_REPORTING *in case > VIRTIO_BALLOON_F_PAGE_POISON is not provided* by the host. AFAIKS, > ordinary inflation/deflation and free page hinting does currently not > care, as we go via free_page(), so that is currently fine AFAIKs. > > ------------------------------------------------------------------------- > What we have to do in the hypervisor/QEMU: > ------------------------------------------------------------------------- > > With VIRTIO_BALLOON_F_PAGE_POISON, we could provide free page reporting > easily IFF "page_val"==0. However, as you said, it will currently be > expensive in case of postcopy, as the guest still zeroes out all pages. > Document that properly. > > With VIRTIO_BALLOON_F_PAGE_POISON, don't inflate any pages right now > (not discarding anything), OR fill the pages with poison_val when > deflating. I guess the latter would be better - even if current Linux > does not need it, it's what we are expected to do AFAIKS. > > With VIRTIO_BALLOON_F_PAGE_POISON and page_val != 0, discard all free > page reporting attempts. (== what your patch #3 does). Document that > properly. > > > Makes sense? See below for guest migration thingies. > > > > >> I could imagine that we also have to care about ordinary page > >> inflation/deflation when handling page poisoning. (Iow, don‘t > >> inflate/deflate if set for now). > > > > The problem is this was broken from the start for that. The issue is > > that the poison feature was wrapped up within the page hinting > > feature. So as a result enabling it for a VM that doesn't recognize > > the feature independently would likely leave the poison value > > uninitialized and flagging as though a value of 0 is used. In addition > > the original poison checking wasn't making sure that the page wasn't > > init_on_free which has the same effect as page poisoning but isn't > > page poisoning. > > If the guest agrees to have VIRTIO_BALLOON_F_PAGE_POISON but does not > initialize poison_val, then it's a guest bug, I wouldn't worry about > that for now. > > > > >>> > >>> The worst case scenario would be one in which the VM was suspended for > >>> migration while there were still pages in the balloon that needed to > >>> be drained. In such a case you would have them in an indeterminate > >>> state since the last page poisoning for them might have been ignored > >>> since they were marked as "free", so they are just going to be > >>> whatever value they were if they had been migrated at all. > >>> > >>>>> > >>>>>> > >>>>>>> + return; > >>>>>>> + } > >>>>>>> + > >>>>>>> if (s->free_page_report_cmd_id == UINT_MAX) { > >>>>>>> s->free_page_report_cmd_id = > >>>>>>> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN; > >>>>>> > >>>>>> We should rename all "free_page_report" stuff we can to > >>>>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my > >>>>>> side :) ) before adding free page reporting . > >>>>>> > >>>>>> (looking at the virtio-balloon linux header, it's also confusing but > >>>>>> we're stuck with that - maybe we should add better comments) > >>>>> > >>>>> Are we stuck? Couldn't we just convert it to an anonymous union with > >>>>> free_page_hint_cmd_id and then use that where needed? > >>>> > >>>> Saw your patch already :) > >>>> > >>>>> > >>>>>>> @@ -618,12 +627,10 @@ static size_t > >>>>>>> virtio_balloon_config_size(VirtIOBalloon *s) > >>>>>>> if (s->qemu_4_0_config_size) { > >>>>>>> return sizeof(struct virtio_balloon_config); > >>>>>>> } > >>>>>>> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) { > >>>>>>> + if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) || > >>>>>>> + virtio_has_feature(features, > >>>>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { > >>>>>>> return sizeof(struct virtio_balloon_config); > >>>>>>> } > >>>>>>> - if (virtio_has_feature(features, > >>>>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { > >>>>>>> - return offsetof(struct virtio_balloon_config, poison_val); > >>>>>>> - } > >>>>>> > >>>>>> I am not sure this change is completely sane. Why is that necessary at > >>>>>> all? > >>>>> > >>>>> The poison_val is stored at the end of the structure and is required > >>>>> in order to make free page hinting to work. What this change is doing > >>>> > >>>> Required to make it work? In the kernel, > >>>> > >>>> commit 2e991629bcf55a43681aec1ee096eeb03cf81709 > >>>> Author: Wei Wang <wei.w.w...@intel.com> > >>>> Date: Mon Aug 27 09:32:19 2018 +0800 > >>>> > >>>> virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON > >>>> > >>>> was merged after > >>>> > >>>> commit 86a559787e6f5cf662c081363f64a20cad654195 > >>>> Author: Wei Wang <wei.w.w...@intel.com> > >>>> Date: Mon Aug 27 09:32:17 2018 +0800 > >>>> > >>>> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT > >>>> > >>>> So I assume it's perfectly fine to not have it. > >>>> > >>>> I'd say it's the responsibility of the guest to *not* use > >>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning > >>>> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself > >>>> into the foot. > >>> > >>> Based on the timing I am guessing the page poisoning was in the same > >>> patch set as the free page hinting since there is only 2 seconds > >>> between when the two are merged. If I recall the page poisoning logic > >>> was added after the issue was pointed out that it needed to address > >>> it. > >>> > >> > >> Yeah, but any other OS implementing this, not caring about page poisoning > >> wouldn‘t need it. Maybe there is something in the spec. > >> > >> Mental note: we‘ll have to migrate the poison value if not already done > >> (writing on my mobile). > > > > Right. We need to make sure any poison or init on free is migrated > > over to the destination before we can say we are going to skip the > > migration. If anything what I probably ought to look into would be if > > we could change the code so that if we have a hint the page is unused, > > poison is enabled, and the value is 0 we just ship over a 0 page > > instead of giving up on hinting entirely. > > > > Yeah, we have to migrate poison_val from source to destination. Also, we > should worry about us losing the page poisoning feature when migrating > from source to destination. > > Thinking about it, it might make sense to completely decouple page > poisoning here. Assign it a separate property (as you did), default > enable it, but disable it via QEMU compat machines. > > Then, we won't lose the feature when migrating. > > -- > Thanks, > > David / dhildenb --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org