[Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. This patch disallows mapping of an ioreq server, when there's still p2m_ioreq_server entry left, in case another mapping occurs right after the current one being unmapped, releases its lock, with p2m table not synced yet. This patch also disallows live migration, when there's remaining p2m_ioreq_server entry in p2m table. The core reason is our current implementation of p2m_change_entry_type_global() lacks information to resync p2m_ioreq_server entries correctly if global_logdirty is on. Signed-off-by: Yu Zhang Reviewed-by: Paul Durrant --- Cc: Paul Durrant Cc: Jan Beulich Cc: Andrew Cooper Cc: George Dunlap Cc: Jun Nakajima Cc: Kevin Tian changes in v5: - According to comments from Jan: use unsigned long for entry_count; - According to comments from Jan: refuse mapping requirement when there's p2m_ioreq_server remained in p2m table. - Added "Reviewed-by: Paul Durrant " changes in v4: - According to comments from Jan: use ASSERT() instead of 'if' condition in p2m_change_type_one(). - According to comments from Jan: commit message changes to mention the p2m_ioreq_server are all based on 4K sized pages. changes in v3: - Move the synchronously resetting logic into patch 5. - According to comments from Jan: introduce p2m_check_changeable() to clarify the p2m type change code. - According to comments from George: use locks in the same order to avoid deadlock, call p2m_change_entry_type_global() after unmap of the ioreq server is finished. changes in v2: - Move the calculation of ioreq server page entry_cout into p2m_change_type_one() so that we do not need a seperate lock. Note: entry_count is also calculated in resolve_misconfig()/ do_recalc(), fortunately callers of both routines have p2m lock protected already. - Simplify logic in hvmop_set_mem_type(). - Introduce routine p2m_finish_type_change() to walk the p2m table and do the p2m reset. --- xen/arch/x86/hvm/ioreq.c | 8 xen/arch/x86/mm/hap/hap.c | 9 + xen/arch/x86/mm/p2m-ept.c | 8 +++- xen/arch/x86/mm/p2m-pt.c | 13 +++-- xen/arch/x86/mm/p2m.c | 29 + xen/include/asm-x86/p2m.h | 9 - 6 files changed, 72 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index 5bf3b6d..07a6c26 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock); +if ( rc == 0 && flags == 0 ) +{ +struct p2m_domain *p2m = p2m_get_hostp2m(d); + +if ( read_atomic(&p2m->ioreq.entry_count) ) +p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); +} + return rc; } diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index a57b385..6ec950a 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -187,6 +187,15 @@ out: */ static int hap_enable_log_dirty(struct domain *d, bool_t log_global) { +struct p2m_domain *p2m = p2m_get_hostp2m(d); + +/* + * Refuse to turn on global log-dirty mode if + * there's outstanding p2m_ioreq_server pages. + */ +if ( log_global && read_atomic(&p2m->ioreq.entry_count) ) +return -EBUSY; + /* turn on PG_log_dirty bit in paging mode */ paging_lock(d); d->arch.paging.mode |= PG_log_dirty; diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index cc1eb21..62e9ddf 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn) e.ipat = ipat; if ( e.recalc && p2m_is_changeable(e.sa_p2mt) ) { + if ( e.sa_p2mt == p2m_ioreq_server ) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i) ? p2m_ram_logdirty : p2m_ram_rw; ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access); @@ -965,7 +971,7 @@ static mfn_t ept_get_entry(struct p2m_doma
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
>>> On 02.04.17 at 14:24, wrote: > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d, > if ( p2m->ioreq.server != NULL ) > goto out; > > +/* > + * It is possible that an ioreq server has just been unmapped, > + * released the spin lock, with some p2m_ioreq_server entries > + * in p2m table remained. We shall refuse another ioreq server > + * mapping request in such case. > + */ > +if ( read_atomic(&p2m->ioreq.entry_count) ) > +goto out; So this produces the same -EINVAL as the earlier check in context above. I think it would be nice if neither did - -EINUSE for the first (which we don't have, so -EOPNOTSUPP would seem the second bets option there) and -EBUSY for the second would seem more appropriate. If you agree, respective adjustments could be done while committing, if no other reason for a v11 arises. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
>>> On 03.04.17 at 16:36, wrote: On 02.04.17 at 14:24, wrote: >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d, >> if ( p2m->ioreq.server != NULL ) >> goto out; >> >> +/* >> + * It is possible that an ioreq server has just been unmapped, >> + * released the spin lock, with some p2m_ioreq_server entries >> + * in p2m table remained. We shall refuse another ioreq server >> + * mapping request in such case. >> + */ >> +if ( read_atomic(&p2m->ioreq.entry_count) ) >> +goto out; > > So this produces the same -EINVAL as the earlier check in context > above. I think it would be nice if neither did - -EINUSE for the first > (which we don't have, so -EOPNOTSUPP would seem the second > bets option there) and -EBUSY for the second would seem more > appropriate. If you agree, respective adjustments could be done > while committing, if no other reason for a v11 arises. Oh, and with that Reviewed-by: Jan Beulich Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/3/2017 10:36 PM, Jan Beulich wrote: So this produces the same -EINVAL as the earlier check in context above. I think it would be nice if neither did - -EINUSE for the first (which we don't have, so -EOPNOTSUPP would seem the second bets option there) and -EBUSY for the second would seem more appropriate. If you agree, respective adjustments could be done while committing, if no other reason for a v11 arises. Thanks Jan. But my code shows both will return -EBUSY, instead of -EINVAL for the mapping requirement: /* Unmap ioreq server from p2m type by passing flags with 0. */ if ( flags == 0 ) { /rc = -EINVAL;/ if ( p2m->ioreq.server != s ) goto out; p2m->ioreq.server = NULL; p2m->ioreq.flags = 0; } else { /rc = -EBUSY;/ if ( p2m->ioreq.server != NULL ) goto out; /* * It is possible that an ioreq server has just been unmapped, * released the spin lock, with some p2m_ioreq_server entries * in p2m table remained. We shall refuse another ioreq server * mapping request in such case. */ if ( read_atomic(&p2m->ioreq.entry_count) ) goto out; p2m->ioreq.server = s; p2m->ioreq.flags = flags; } Are you really wanna use -EOPNOTSUPP when p2m->ioreq.server is not NULL for the mapping code? For the unmapping code, -EINVAL is used when the ioreq server to be unmapped is not the designated one. But for this one, I am not sure which one is better: -EINVAL or -EBUSY? Yu ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
>>> On 05.04.17 at 09:18, wrote: > On 4/3/2017 10:36 PM, Jan Beulich wrote: >> So this produces the same -EINVAL as the earlier check in context >> above. I think it would be nice if neither did - -EINUSE for the first >> (which we don't have, so -EOPNOTSUPP would seem the second >> bets option there) and -EBUSY for the second would seem more >> appropriate. If you agree, respective adjustments could be done >> while committing, if no other reason for a v11 arises. > > Thanks Jan. First of all - deleting irrelevant context from replies is always appreciated, but I think you went too far here. It is no longer possible to re-associate your comments with the change you've made (not visible from the code fragment below). > But my code shows both will return -EBUSY, instead of -EINVAL for the > mapping requirement: >/* Unmap ioreq server from p2m type by passing flags with 0. */ >if ( flags == 0 ) >{ > /rc = -EINVAL;/ >if ( p2m->ioreq.server != s ) >goto out; > >p2m->ioreq.server = NULL; >p2m->ioreq.flags = 0; >} >else >{ > /rc = -EBUSY;/ >if ( p2m->ioreq.server != NULL ) >goto out; > >/* > * It is possible that an ioreq server has just been unmapped, > * released the spin lock, with some p2m_ioreq_server entries > * in p2m table remained. We shall refuse another ioreq server > * mapping request in such case. > */ >if ( read_atomic(&p2m->ioreq.entry_count) ) >goto out; Oh, I see: I've mistakenly assumed to addition was to the if() branch. I clearly didn't look closely enough, I'm sorry. >p2m->ioreq.server = s; >p2m->ioreq.flags = flags; >} > > Are you really wanna use -EOPNOTSUPP when p2m->ioreq.server is not NULL > for the mapping code? As per above, I really did refer to the wrong piece of code, so all is fine the way the code is. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: > After an ioreq server has unmapped, the remaining p2m_ioreq_server > entries need to be reset back to p2m_ram_rw. This patch does this > asynchronously with the current p2m_change_entry_type_global() > interface. > > New field entry_count is introduced in struct p2m_domain, to record > the number of p2m_ioreq_server p2m page table entries. One nature of > these entries is that they only point to 4K sized page frames, because > all p2m_ioreq_server entries are originated from p2m_ram_rw ones in > p2m_change_type_one(). We do not need to worry about the counting for > 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... > diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c > index a57b385..6ec950a 100644 > --- a/xen/arch/x86/mm/hap/hap.c > +++ b/xen/arch/x86/mm/hap/hap.c > @@ -187,6 +187,15 @@ out: > */ > static int hap_enable_log_dirty(struct domain *d, bool_t log_global) > { > +struct p2m_domain *p2m = p2m_get_hostp2m(d); > + > +/* > + * Refuse to turn on global log-dirty mode if > + * there's outstanding p2m_ioreq_server pages. Grammar nit: if *there are* outstanding pages. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/5/2017 10:41 PM, George Dunlap wrote: On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... Thanks for your thorough consideration, George. But I do not think we need to worry about this: If the emulation is in process, the balloon driver cannot get a p2m_ioreq_server page - because it is already allocated. And even when emulation is finished, the balloon driver successfully get this page, and triggers decrease_reservation, the purpose is to remove the current mapping relation between the gfn and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if everything is goes fine, and then p2m_set_entry(), which will trigger the recalc logic eventually, either in ept_set_entry() or p2m_pt_set_entry(). Then the entry_count will be updated in the recalc logic. diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index a57b385..6ec950a 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -187,6 +187,15 @@ out: */ static int hap_enable_log_dirty(struct domain *d, bool_t log_global) { +struct p2m_domain *p2m = p2m_get_hostp2m(d); + +/* + * Refuse to turn on global log-dirty mode if + * there's outstanding p2m_ioreq_server pages. Grammar nit: if *there are* outstanding pages. Oh, right. Thanks B.R. Yu -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 05/04/17 17:22, Yu Zhang wrote: > > > On 4/5/2017 10:41 PM, George Dunlap wrote: >> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang >> wrote: >>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>> entries need to be reset back to p2m_ram_rw. This patch does this >>> asynchronously with the current p2m_change_entry_type_global() >>> interface. >>> >>> New field entry_count is introduced in struct p2m_domain, to record >>> the number of p2m_ioreq_server p2m page table entries. One nature of >>> these entries is that they only point to 4K sized page frames, because >>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>> p2m_change_type_one(). We do not need to worry about the counting for >>> 2M/1G sized pages. >> Assuming that all p2m_ioreq_server entries are *created* by >> p2m_change_type_one() may valid, but can you assume that they are only >> ever *removed* by p2m_change_type_one() (or recalculation)? >> >> What happens, for instance, if a guest balloons out one of the ram >> pages? I don't immediately see anything preventing a p2m_ioreq_server >> page from being ballooned out, nor anything on the >> decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did >> I miss something? >> >> Other than that, only one minor comment... > > Thanks for your thorough consideration, George. But I do not think we > need to worry about this: > > If the emulation is in process, the balloon driver cannot get a > p2m_ioreq_server page - because > it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? It's the hypervisor's job to do the right thing even if the guest and the device model don't. > And even when emulation is finished, the balloon driver successfully get > this page, and triggers > decrease_reservation, the purpose is to remove the current mapping > relation between the gfn > and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if > everything is goes fine, and then > p2m_set_entry(), which will trigger the recalc logic eventually, either > in ept_set_entry() or > p2m_pt_set_entry(). Then the entry_count will be updated in the recalc > logic. Yes, once the lazy type change has been made, we can rely on the recalculation logic to make sure that the types are changed appropriately. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/6/2017 12:35 AM, George Dunlap wrote: On 05/04/17 17:22, Yu Zhang wrote: On 4/5/2017 10:41 PM, George Dunlap wrote: On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... Thanks for your thorough consideration, George. But I do not think we need to worry about this: If the emulation is in process, the balloon driver cannot get a p2m_ioreq_server page - because it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? Then the lazy p2m change code will be triggered, and this page is reset to p2m_ram_rw before being set to p2m_invalid, just like the normal path. Will this be a problem? It's the hypervisor's job to do the right thing even if the guest and the device model don't. And even when emulation is finished, the balloon driver successfully get this page, and triggers decrease_reservation, the purpose is to remove the current mapping relation between the gfn and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if everything is goes fine, and then p2m_set_entry(), which will trigger the recalc logic eventually, either in ept_set_entry() or p2m_pt_set_entry(). Then the entry_count will be updated in the recalc logic. Yes, once the lazy type change has been made, we can rely on the recalculation logic to make sure that the types are changed appropriately. Yep. So my understanding is that as long as p2m_set_entry() is used to change the p2m, we do not need to worry about this. B.R. Yu -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 05/04/17 17:32, Yu Zhang wrote: > > > On 4/6/2017 12:35 AM, George Dunlap wrote: >> On 05/04/17 17:22, Yu Zhang wrote: >>> >>> On 4/5/2017 10:41 PM, George Dunlap wrote: On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: > After an ioreq server has unmapped, the remaining p2m_ioreq_server > entries need to be reset back to p2m_ram_rw. This patch does this > asynchronously with the current p2m_change_entry_type_global() > interface. > > New field entry_count is introduced in struct p2m_domain, to record > the number of p2m_ioreq_server p2m page table entries. One nature of > these entries is that they only point to 4K sized page frames, because > all p2m_ioreq_server entries are originated from p2m_ram_rw ones in > p2m_change_type_one(). We do not need to worry about the counting for > 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... >>> Thanks for your thorough consideration, George. But I do not think we >>> need to worry about this: >>> >>> If the emulation is in process, the balloon driver cannot get a >>> p2m_ioreq_server page - because >>> it is already allocated. >> In theory, yes, the guest *shouldn't* do this. But what if the guest OS >> makes a mistake? Or, what if the ioreq server makes a mistake and >> places a watch on a page that *isn't* allocated by the device driver, or >> forgets to change a page type back to ram when the device driver frees >> it back to the guest kernel? > > Then the lazy p2m change code will be triggered, and this page is reset > to p2m_ram_rw > before being set to p2m_invalid, just like the normal path. Will this be > a problem? No, I'm talking about before the ioreq server detaches. Scenario 1: Bug in driver 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver accidentally frees A to the kernel 4. guest kernel balloons out page A; ioreq.entry_count is wrong Scenario 2: Bug in the kernel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest kernel tries to balloon out page B, but makes a calculation mistake and balloons out A instead; now ioreq.entry_count is wrong Scenario 3: Off-by-one bug in devicemodel 1. Guest driver allocates pages A-D 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra page) 3. guest kernel balloons out page E; now ioreq.entry_count is wrong Scenario 4: "Leak" in devicemodel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver is done with page A, but DM forgets to reset it to p2m_ram_rw 4. Guest driver frees A to guest kernel 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong I could keep going on; there are *lots* of bugs in the driver, the kernel, or the devicemodel which could cause pages marked p2m_ioreq_server to end up being ballooned out; which under the current code would make ioreq.entry_count wrong. It's the hypervisor's job to do the right thing even when other components have bugs in them. This is why I initially suggested keeping count in atomic_write_ept_entry() -- no matter how the entry is changed, we always know exactly how many entries of type p2m_ioreq_server we have. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/6/2017 1:01 AM, George Dunlap wrote: On 05/04/17 17:32, Yu Zhang wrote: On 4/6/2017 12:35 AM, George Dunlap wrote: On 05/04/17 17:22, Yu Zhang wrote: On 4/5/2017 10:41 PM, George Dunlap wrote: On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... Thanks for your thorough consideration, George. But I do not think we need to worry about this: If the emulation is in process, the balloon driver cannot get a p2m_ioreq_server page - because it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? Then the lazy p2m change code will be triggered, and this page is reset to p2m_ram_rw before being set to p2m_invalid, just like the normal path. Will this be a problem? No, I'm talking about before the ioreq server detaches. Sorry, I do not get it. Take scenario 1 for example: Scenario 1: Bug in driver 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server Here in step 2. the ioreq.entry_count increases; 3. Guest driver accidentally frees A to the kernel 4. guest kernel balloons out page A; ioreq.entry_count is wrong Here in step 4. the ioreq.entry_count decreases. Isn't this what we are expecting? Yu Scenario 2: Bug in the kernel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest kernel tries to balloon out page B, but makes a calculation mistake and balloons out A instead; now ioreq.entry_count is wrong Scenario 3: Off-by-one bug in devicemodel 1. Guest driver allocates pages A-D 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra page) 3. guest kernel balloons out page E; now ioreq.entry_count is wrong Scenario 4: "Leak" in devicemodel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver is done with page A, but DM forgets to reset it to p2m_ram_rw 4. Guest driver frees A to guest kernel 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong I could keep going on; there are *lots* of bugs in the driver, the kernel, or the devicemodel which could cause pages marked p2m_ioreq_server to end up being ballooned out; which under the current code would make ioreq.entry_count wrong. It's the hypervisor's job to do the right thing even when other components have bugs in them. This is why I initially suggested keeping count in atomic_write_ept_entry() -- no matter how the entry is changed, we always know exactly how many entries of type p2m_ioreq_server we have. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 05/04/17 18:18, Yu Zhang wrote: > > > On 4/6/2017 1:01 AM, George Dunlap wrote: >> On 05/04/17 17:32, Yu Zhang wrote: >>> >>> On 4/6/2017 12:35 AM, George Dunlap wrote: On 05/04/17 17:22, Yu Zhang wrote: > On 4/5/2017 10:41 PM, George Dunlap wrote: >> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang >> wrote: >>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>> entries need to be reset back to p2m_ram_rw. This patch does this >>> asynchronously with the current p2m_change_entry_type_global() >>> interface. >>> >>> New field entry_count is introduced in struct p2m_domain, to record >>> the number of p2m_ioreq_server p2m page table entries. One nature of >>> these entries is that they only point to 4K sized page frames, >>> because >>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>> p2m_change_type_one(). We do not need to worry about the counting >>> for >>> 2M/1G sized pages. >> Assuming that all p2m_ioreq_server entries are *created* by >> p2m_change_type_one() may valid, but can you assume that they are >> only >> ever *removed* by p2m_change_type_one() (or recalculation)? >> >> What happens, for instance, if a guest balloons out one of the ram >> pages? I don't immediately see anything preventing a >> p2m_ioreq_server >> page from being ballooned out, nor anything on the >> decrease_reservation() path decreasing p2m->ioreq.entry_count. Or >> did >> I miss something? >> >> Other than that, only one minor comment... > Thanks for your thorough consideration, George. But I do not think we > need to worry about this: > > If the emulation is in process, the balloon driver cannot get a > p2m_ioreq_server page - because > it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? >>> Then the lazy p2m change code will be triggered, and this page is reset >>> to p2m_ram_rw >>> before being set to p2m_invalid, just like the normal path. Will this be >>> a problem? >> No, I'm talking about before the ioreq server detaches. > Sorry, I do not get it. Take scenario 1 for example: >> Scenario 1: Bug in driver >> 1. Guest driver allocates page A >> 2. dm marks A as p2m_ioreq_server > Here in step 2. the ioreq.entry_count increases; >> 3. Guest driver accidentally frees A to the kernel >> 4. guest kernel balloons out page A; ioreq.entry_count is wrong > > Here in step 4. the ioreq.entry_count decreases. > Isn't this what we are expecting? So step 4 happens via xen/common/memory.c:decrease_reservation(). Where along that path will ioreq.entry_count be decreased for the page freed? Maybe it is, but I didn't see it when I looked. As far as I can tell, after step 4, ioreq.entry_count would still be 1, but that the actual number of p2m_ioreq_server entries would be 0. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/6/2017 1:18 AM, Yu Zhang wrote: On 4/6/2017 1:01 AM, George Dunlap wrote: On 05/04/17 17:32, Yu Zhang wrote: On 4/6/2017 12:35 AM, George Dunlap wrote: On 05/04/17 17:22, Yu Zhang wrote: On 4/5/2017 10:41 PM, George Dunlap wrote: On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... Thanks for your thorough consideration, George. But I do not think we need to worry about this: If the emulation is in process, the balloon driver cannot get a p2m_ioreq_server page - because it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? Then the lazy p2m change code will be triggered, and this page is reset to p2m_ram_rw before being set to p2m_invalid, just like the normal path. Will this be a problem? No, I'm talking about before the ioreq server detaches. Sorry, I do not get it. Take scenario 1 for example: Scenario 1: Bug in driver 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server Here in step 2. the ioreq.entry_count increases; 3. Guest driver accidentally frees A to the kernel 4. guest kernel balloons out page A; ioreq.entry_count is wrong Here in step 4. the ioreq.entry_count decreases. Oh. I figured out. This entry is not invalidated yet if ioreq is not unmapped. Sorry. Isn't this what we are expecting? Yu Scenario 2: Bug in the kernel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest kernel tries to balloon out page B, but makes a calculation mistake and balloons out A instead; now ioreq.entry_count is wrong Scenario 3: Off-by-one bug in devicemodel 1. Guest driver allocates pages A-D 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra page) 3. guest kernel balloons out page E; now ioreq.entry_count is wrong Scenario 4: "Leak" in devicemodel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver is done with page A, but DM forgets to reset it to p2m_ram_rw 4. Guest driver frees A to guest kernel 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong I could keep going on; there are *lots* of bugs in the driver, the kernel, or the devicemodel which could cause pages marked p2m_ioreq_server to end up being ballooned out; which under the current code would make ioreq.entry_count wrong. It's the hypervisor's job to do the right thing even when other components have bugs in them. This is why I initially suggested keeping count in atomic_write_ept_entry() -- no matter how the entry is changed, we always know exactly how many entries of type p2m_ioreq_server we have. Well, count in atomic_write_ept_entry() only works for ept. Besides, it requires interface changes - need to pass the p2m. Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn does not change. So how about we disable ballooning if ioreq.entry_count is not 0? Yu -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/6/2017 1:28 AM, Yu Zhang wrote: On 4/6/2017 1:18 AM, Yu Zhang wrote: On 4/6/2017 1:01 AM, George Dunlap wrote: On 05/04/17 17:32, Yu Zhang wrote: On 4/6/2017 12:35 AM, George Dunlap wrote: On 05/04/17 17:22, Yu Zhang wrote: On 4/5/2017 10:41 PM, George Dunlap wrote: On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... Thanks for your thorough consideration, George. But I do not think we need to worry about this: If the emulation is in process, the balloon driver cannot get a p2m_ioreq_server page - because it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? Then the lazy p2m change code will be triggered, and this page is reset to p2m_ram_rw before being set to p2m_invalid, just like the normal path. Will this be a problem? No, I'm talking about before the ioreq server detaches. Sorry, I do not get it. Take scenario 1 for example: Scenario 1: Bug in driver 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server Here in step 2. the ioreq.entry_count increases; 3. Guest driver accidentally frees A to the kernel 4. guest kernel balloons out page A; ioreq.entry_count is wrong Here in step 4. the ioreq.entry_count decreases. Oh. I figured out. This entry is not invalidated yet if ioreq is not unmapped. Sorry. Isn't this what we are expecting? Yu Scenario 2: Bug in the kernel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest kernel tries to balloon out page B, but makes a calculation mistake and balloons out A instead; now ioreq.entry_count is wrong Scenario 3: Off-by-one bug in devicemodel 1. Guest driver allocates pages A-D 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra page) 3. guest kernel balloons out page E; now ioreq.entry_count is wrong Scenario 4: "Leak" in devicemodel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver is done with page A, but DM forgets to reset it to p2m_ram_rw 4. Guest driver frees A to guest kernel 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong I could keep going on; there are *lots* of bugs in the driver, the kernel, or the devicemodel which could cause pages marked p2m_ioreq_server to end up being ballooned out; which under the current code would make ioreq.entry_count wrong. It's the hypervisor's job to do the right thing even when other components have bugs in them. This is why I initially suggested keeping count in atomic_write_ept_entry() -- no matter how the entry is changed, we always know exactly how many entries of type p2m_ioreq_server we have. Well, count in atomic_write_ept_entry() only works for ept. Besides, it requires interface changes - need to pass the p2m. Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn does not change. So how about we disable ballooning if ioreq.entry_count is not 0? Or maybe just change the p2m_ioreq_server to p2m_ram_rw before it is set to p2m_invalid? Like below code: diff --git a/xen/common/memory.c b/xen/common/memory.c index 7dbddda..40e5f63 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) put_gfn(d, gmfn); return 1; } +if ( unlikely(p2mt == p2m_ioreq_server) ) +p2m_change_type_one(d, gmfn, +p2m_ioreq_server, p2m_ram_rw); + #else mfn = gfn_to_mfn(d, _gfn(gmfn)); #endif Yu Yu -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/6/2017 2:02 AM, Yu Zhang wrote: On 4/6/2017 1:28 AM, Yu Zhang wrote: On 4/6/2017 1:18 AM, Yu Zhang wrote: On 4/6/2017 1:01 AM, George Dunlap wrote: On 05/04/17 17:32, Yu Zhang wrote: On 4/6/2017 12:35 AM, George Dunlap wrote: On 05/04/17 17:22, Yu Zhang wrote: On 4/5/2017 10:41 PM, George Dunlap wrote: On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang wrote: After an ioreq server has unmapped, the remaining p2m_ioreq_server entries need to be reset back to p2m_ram_rw. This patch does this asynchronously with the current p2m_change_entry_type_global() interface. New field entry_count is introduced in struct p2m_domain, to record the number of p2m_ioreq_server p2m page table entries. One nature of these entries is that they only point to 4K sized page frames, because all p2m_ioreq_server entries are originated from p2m_ram_rw ones in p2m_change_type_one(). We do not need to worry about the counting for 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... Thanks for your thorough consideration, George. But I do not think we need to worry about this: If the emulation is in process, the balloon driver cannot get a p2m_ioreq_server page - because it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? Then the lazy p2m change code will be triggered, and this page is reset to p2m_ram_rw before being set to p2m_invalid, just like the normal path. Will this be a problem? No, I'm talking about before the ioreq server detaches. Sorry, I do not get it. Take scenario 1 for example: Scenario 1: Bug in driver 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server Here in step 2. the ioreq.entry_count increases; 3. Guest driver accidentally frees A to the kernel 4. guest kernel balloons out page A; ioreq.entry_count is wrong Here in step 4. the ioreq.entry_count decreases. Oh. I figured out. This entry is not invalidated yet if ioreq is not unmapped. Sorry. Isn't this what we are expecting? Yu Scenario 2: Bug in the kernel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest kernel tries to balloon out page B, but makes a calculation mistake and balloons out A instead; now ioreq.entry_count is wrong Scenario 3: Off-by-one bug in devicemodel 1. Guest driver allocates pages A-D 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra page) 3. guest kernel balloons out page E; now ioreq.entry_count is wrong Scenario 4: "Leak" in devicemodel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver is done with page A, but DM forgets to reset it to p2m_ram_rw 4. Guest driver frees A to guest kernel 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong I could keep going on; there are *lots* of bugs in the driver, the kernel, or the devicemodel which could cause pages marked p2m_ioreq_server to end up being ballooned out; which under the current code would make ioreq.entry_count wrong. It's the hypervisor's job to do the right thing even when other components have bugs in them. This is why I initially suggested keeping count in atomic_write_ept_entry() -- no matter how the entry is changed, we always know exactly how many entries of type p2m_ioreq_server we have. Well, count in atomic_write_ept_entry() only works for ept. Besides, it requires interface changes - need to pass the p2m. Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn does not change. So how about we disable ballooning if ioreq.entry_count is not 0? Or maybe just change the p2m_ioreq_server to p2m_ram_rw before it is set to p2m_invalid? Like below code: diff --git a/xen/common/memory.c b/xen/common/memory.c index 7dbddda..40e5f63 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) put_gfn(d, gmfn); return 1; } +if ( unlikely(p2mt == p2m_ioreq_server) ) +p2m_change_type_one(d, gmfn, +p2m_ioreq_server, p2m_ram_rw); + #else mfn = gfn_to_mfn(d, _gfn(gmfn)); #endif Sorry for the format. Please ignore above code, and see below one: diff --git a/xen/common/memory.c b/xen/common/memory.c index 7dbddda..40e5f63 100644 --- a
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
>>> On 05.04.17 at 19:28, wrote: > Well, count in atomic_write_ept_entry() only works for ept. Besides, it > requires > interface changes - need to pass the p2m. > Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn > does > not change. So how about we disable ballooning if ioreq.entry_count is > not 0? You can't disable ballooning - the guest is free to make decisions regarding what memory to return to the hypervisor all by itself. Tool stack directions (via xenstore) are merely a hint for well behaved guests. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
>>> On 05.04.17 at 20:04, wrote: > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long > gmfn) > put_gfn(d, gmfn); > return 1; > } > +if ( unlikely(p2mt == p2m_ioreq_server) ) > +p2m_change_type_one(d, gmfn, > +p2m_ioreq_server, p2m_ram_rw); > + > #else > mfn = gfn_to_mfn(d, _gfn(gmfn)); > #endif To be honest, this looks more like a quick hack than a proper solution at the first glance. To me it would seem preferable if the count was adjusted at the point the P2M entry is being replaced (i.e. down the call stack from guest_physmap_remove_page()). The closer to the actual changing of the P2M entry, the less likely you miss any call paths (as was already explained by George while suggesting to put the accounting into the leaf most EPT/NPT functions). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 4/6/2017 3:48 PM, Jan Beulich wrote: On 05.04.17 at 20:04, wrote: --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) put_gfn(d, gmfn); return 1; } +if ( unlikely(p2mt == p2m_ioreq_server) ) +p2m_change_type_one(d, gmfn, +p2m_ioreq_server, p2m_ram_rw); + #else mfn = gfn_to_mfn(d, _gfn(gmfn)); #endif To be honest, this looks more like a quick hack than a proper solution at the first glance. To me it would seem preferable if the count was Yeah, right. :) adjusted at the point the P2M entry is being replaced (i.e. down the call stack from guest_physmap_remove_page()). The closer to the actual changing of the P2M entry, the less likely you miss any call paths (as was already explained by George while suggesting to put the accounting into the leaf most EPT/NPT functions). Well, I thought I have explained the reason why I have always been hesitating to do the count in atomic_write_ept_entry(). But seems I did not make it clear: 1> atomic_write_ept_entry() is used each time an p2m entry is written, but sweeping p2m_ioreq_server is only supposed to happen when an ioreq server unmaps. Checking the p2m here impacts the performance, and I do not think it is worthy - consider there may be very limited usage requirement for the p2m_ioreq_server; 2> atomic_write_ept_entry() does not have a p2m parameter, even some of its caller does not have either; 3> the lazy p2m type change is triggered at p2m level, by p2m_change_entry_type_global(). It can be used not only for intel ept. And supporting the count at the lowest level for non intel ept platform is more complex than changes in atomic_write_ept_entry(). 4> Fortunately, we have both resolve_misconfig() and do_recalc(), so I thought it would be enough to do the count in these two routines, together with the count in p2m_change_type_one(). But I had to admit I did not think of the extreme scenarios raised by George - I had always assumed an p2m_ioreq_server page will not be allocated to the balloon driver when it is in use. So here I have another proposal - we shall not allow a p2m_ioreq_server being ballooned out. I mean, if some bug in kernel really allocates a p2m_ioreq_server page to a balloon driver, or if the driver is a malicious one which does not tell the device model this gfn shall be no longer emulated, the hypervisor shall let the ballooning fail for this gfn. After all, if such situation happens, the guest or the device model already have bugs, and these last 2 patches are to make sure that even if there's bug in guest/device model, xen will help do the cleanup, instead of to tolerate guest bugs. If you think this is reasonable, I have drafted a patch, like this: diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c index d5fea72..ff726ad 100644 --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -606,7 +606,8 @@ p2m_pod_decrease_reservation(struct domain *d, BUG_ON(p2m->pod.entry_count < 0); pod -= n; } -else if ( steal_for_cache && p2m_is_ram(t) ) +else if ( steal_for_cache && p2m_is_ram(t) && + (t != p2m_ioreq_server) ) { /* * If we need less than 1 << cur_order, we may end up stealing diff --git a/xen/common/memory.c b/xen/common/memory.c index 7dbddda..40d5545 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -288,6 +288,14 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) put_gfn(d, gmfn); return 1; } +if ( unlikely(p2mt == p2m_ioreq_server) ) +{ +put_gfn(d, gmfn); +gdprintk(XENLOG_INFO, "Domain %u page %lx cannot be removed.\n", +d->domain_id, gmfn); +return 0; +} + #else mfn = gfn_to_mfn(d, _gfn(gmfn)); #endif Changes made in p2m_pod_decrease_reservation() is to not let balloon driver to steal a p2m_ioreq_server page in PoD; Changes made in guest_remove_page() is to disallow a p2m_ioreq_server page being removed. Thanks Yu Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
>>> On 06.04.17 at 10:27, wrote: > On 4/6/2017 3:48 PM, Jan Beulich wrote: > On 05.04.17 at 20:04, wrote: >>> --- a/xen/common/memory.c >>> +++ b/xen/common/memory.c >>> @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long >>> gmfn) >>>put_gfn(d, gmfn); >>>return 1; >>>} >>> +if ( unlikely(p2mt == p2m_ioreq_server) ) >>> +p2m_change_type_one(d, gmfn, >>> +p2m_ioreq_server, p2m_ram_rw); >>> + >>>#else >>>mfn = gfn_to_mfn(d, _gfn(gmfn)); >>>#endif >> To be honest, this looks more like a quick hack than a proper solution >> at the first glance. To me it would seem preferable if the count was > > Yeah, right. :) > >> adjusted at the point the P2M entry is being replaced (i.e. down the >> call stack from guest_physmap_remove_page()). The closer to the >> actual changing of the P2M entry, the less likely you miss any call >> paths (as was already explained by George while suggesting to put >> the accounting into the leaf most EPT/NPT functions). > > Well, I thought I have explained the reason why I have always been > hesitating to do the count in > atomic_write_ept_entry(). But seems I did not make it clear: Well, there was no reason to re-explain. I understand your reasoning, and I understand George's. Hence my request to move it _closer_ to the leaf function, not specifically to move it _into_ there. > But I had to admit I did not think of the extreme scenarios raised by > George - I had always assumed an p2m_ioreq_server > page will not be allocated to the balloon driver when it is in use. > > So here I have another proposal - we shall not allow a p2m_ioreq_server > being ballooned out. I mean, if some bug in > kernel really allocates a p2m_ioreq_server page to a balloon driver, or > if the driver is a malicious one which does not > tell the device model this gfn shall be no longer emulated, the > hypervisor shall let the ballooning fail for this gfn. After > all, if such situation happens, the guest or the device model already > have bugs, and these last 2 patches are to make > sure that even if there's bug in guest/device model, xen will help do > the cleanup, instead of to tolerate guest bugs. > > If you think this is reasonable, I have drafted a patch, like this: Well, that's still the same hack as before (merely extended to PoD code), isn't it? I'd still prefer the accounting to be got right instead of the page remove attempt to be failed. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel