On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -2009,6 +2009,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned > long gla, > goto out_put_gfn; > } > > + if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present && > + subpage_mmio_write_accept(mfn, gla) &&
Afaics subpage_mmio_write_accept() is unreachable then when CONFIG_HVM=n? > + (hvm_emulate_one_mmio(mfn_x(mfn), gla) == X86EMUL_OKAY) ) > + { > + rc = 1; > + goto out_put_gfn; > + } Overall this new if() is pretty similar to the immediate preceding one. So similar that I wonder whether the two shouldn't be folded. In fact it looks as if the new one is needed only for the case where you'd pass through (to a DomU) a device partially used by Xen. That could certainly do with mentioning explicitly. > +static void __iomem *subpage_mmio_get_page(struct subpage_ro_range *entry) Considering what the function does and what it returns, perhaps better s/get/map/? The "get_page" part of the name generally has a different meaning in Xen's memory management. > +{ > + void __iomem *mapped_page; > + > + if ( entry->mapped ) > + return entry->mapped; > + > + mapped_page = ioremap(mfn_x(entry->mfn) << PAGE_SHIFT, PAGE_SIZE); > + > + spin_lock(&subpage_ro_lock); > + /* Re-check under the lock */ > + if ( entry->mapped ) > + { > + spin_unlock(&subpage_ro_lock); > + iounmap(mapped_page); The only unmap is on an error path here and on another error path elsewhere. IOW it looks as if devices with such marked pages are meant to never be hot unplugged. I can see that being intentional for the XHCI console, but imo such a restriction also needs prominently calling out in a comment next to e.g. the function declaration. > + return entry->mapped; > + } > + > + entry->mapped = mapped_page; > + spin_unlock(&subpage_ro_lock); > + return entry->mapped; > +} > + > +static void subpage_mmio_write_emulate( > + mfn_t mfn, > + unsigned int offset, > + const void *data, > + unsigned int len) > +{ > + struct subpage_ro_range *entry; > + void __iomem *addr; Wouldn't this better be pointer-to-volatile, with ... > + list_for_each_entry(entry, &subpage_ro_ranges, list) > + { > + if ( mfn_eq(entry->mfn, mfn) ) > + { > + if ( test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, entry->ro_qwords) ) > + { > + write_ignored: > + gprintk(XENLOG_WARNING, > + "ignoring write to R/O MMIO 0x%"PRI_mfn"%03x len > %u\n", > + mfn_x(mfn), offset, len); > + return; > + } > + > + addr = subpage_mmio_get_page(entry); > + if ( !addr ) > + { > + gprintk(XENLOG_ERR, > + "Failed to map page for MMIO write at > 0x%"PRI_mfn"%03x\n", > + mfn_x(mfn), offset); > + return; > + } > + > + switch ( len ) > + { > + case 1: > + writeb(*(const uint8_t*)data, addr); > + break; > + case 2: > + writew(*(const uint16_t*)data, addr); > + break; > + case 4: > + writel(*(const uint32_t*)data, addr); > + break; > + case 8: > + writeq(*(const uint64_t*)data, addr); > + break; ... this being how it's written? (If so, volatile suitably carried through to other places as well.) > + default: > + /* mmio_ro_emulated_write() already validated the size */ > + ASSERT_UNREACHABLE(); > + goto write_ignored; > + } > + return; > + } > + } > + /* Do not print message for pages without any writable parts. */ > +} > + > +bool subpage_mmio_write_accept(mfn_t mfn, unsigned long gla) > +{ > + unsigned int offset = PAGE_OFFSET(gla); > + const struct subpage_ro_range *entry; > + > + list_for_each_entry_rcu(entry, &subpage_ro_ranges, list) Considering the other remark about respective devices impossible to go away, is the RCU form here really needed? Its use gives the (false) impression of entry removal being possible. > + if ( mfn_eq(entry->mfn, mfn) && > + !test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, entry->ro_qwords) ) Btw, "qwords" in the field name is kind of odd when SUBPAGE_MMIO_RO_ALIGN in principle suggests that changing granularity ought to be possible by simply adjusting that #define. Maybe "->ro_elems"? > --- a/xen/arch/x86/pv/ro-page-fault.c > +++ b/xen/arch/x86/pv/ro-page-fault.c > @@ -330,6 +330,7 @@ static int mmio_ro_do_page_fault(struct x86_emulate_ctxt > *ctxt, > return X86EMUL_UNHANDLEABLE; > } > > + mmio_ro_ctxt.mfn = mfn; > ctxt->data = &mmio_ro_ctxt; > if ( pci_ro_mmcfg_decode(mfn_x(mfn), &mmio_ro_ctxt.seg, > &mmio_ro_ctxt.bdf) ) > return x86_emulate(ctxt, &mmcfg_intercept_ops); Wouldn't you better set .mfn only on the "else" path, just out of context? Suggesting that the new field in the struct could actually overlay the (seg,bdf) tuple (being of relevance only to MMCFG intercept handling). This would be more for documentation purposes than to actually save space. (If so, perhaps the "else" itself would also better be dropped while making the adjustment.) Jan