On 4/9/2016 6:28 AM, Jan Beulich wrote:
On 31.03.16 at 12:53, <yu.c.zh...@linux.intel.com> wrote:
+static int mem_write(const struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint32_t size,
+ uint64_t data)
+{
+ struct domain *currd = current->domain;
+ unsigned long gmfn = paddr_to_pfn(addr);
+ unsigned long offset = addr & ~PAGE_MASK;
+ struct page_info *page = get_page_from_gfn(currd, gmfn, NULL, P2M_UNSHARE);
+ uint8_t *p;
+
+ if ( !page )
+ return X86EMUL_UNHANDLEABLE;
+
+ p = __map_domain_page(page);
+ p += offset;
+ memcpy(p, &data, size);
What if the page is a r/o one? Not having found an ioreq server, I'm
not sure assumptions on the page being writable can validly be made.
@@ -168,13 +226,72 @@ static int hvmemul_do_io(
break;
case X86EMUL_UNHANDLEABLE:
{
- struct hvm_ioreq_server *s =
- hvm_select_ioreq_server(curr->domain, &p);
+ struct hvm_ioreq_server *s;
+ p2m_type_t p2mt;
+
+ if ( is_mmio )
+ {
+ unsigned long gmfn = paddr_to_pfn(addr);
+
+ (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
+
+ switch ( p2mt )
+ {
+ case p2m_ioreq_server:
+ {
+ unsigned long flags;
+
+ p2m_get_ioreq_server(currd, &flags, &s);
As the function apparently returns no value right now, please avoid
the indirection on both values you're after - one of the two
(presumably s) can be the function's return value.
Well, current implementation of p2m_get_ioreq_server() has spin_lock/
spin_unlock surrounding the reading of flags and the s, but I believe
we can also use the s as return value.
+ if ( !s )
+ break;
+
+ if ( (dir == IOREQ_READ &&
+ !(flags & P2M_IOREQ_HANDLE_READ_ACCESS)) ||
+ (dir == IOREQ_WRITE &&
+ !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS)) )
I think this would be easier to read using a conditional expression
with the condition being dir == IOREQ_<one-of-the-two>, just
selecting either of the two possible bit masks.
+ s = NULL;
+
+ break;
+ }
+ case p2m_ram_rw:
Blank line above here please.
/* If there is no suitable backing DM, just ignore accesses */
if ( !s )
{
- rc = hvm_process_io_intercept(&null_handler, &p);
+ switch ( p2mt )
+ {
+ case p2m_ioreq_server:
+ /*
+ * Race conditions may exist when access to a gfn with
+ * p2m_ioreq_server is intercepted by hypervisor, during
+ * which time p2m type of this gfn is recalculated back
+ * to p2m_ram_rw. mem_handler is used to handle this
+ * corner case.
+ */
Now if there is such a race condition, the race could also be with a
page changing first to ram_rw and then immediately further to e.g.
ram_ro. See the earlier comment about assuming the page to be
writable.
Thanks, Jan. After rechecking the code, I suppose the race condition
will not happen. In hvmemul_do_io(), get_gfn_query_unlocked() is used
to peek the p2mt for the gfn, but get_gfn_type_access() is called inside
hvm_hap_nested_page_fault(), and this will guarantee no p2m change shall
occur during the emulation.
Is this understanding correct?
+ case p2m_ram_rw:
+ rc = hvm_process_io_intercept(&mem_handler, &p);
+ break;
+
+ default:
+ rc = hvm_process_io_intercept(&null_handler, &p);
Along with the above, I doubt it is correct to have e.g. ram_ro come
here.
So if no race condition happens, no need to treat p2m_ram_rw specially.
+static int hvm_map_mem_type_to_ioreq_server(struct domain *d,
+ ioservid_t id,
+ hvmmem_type_t type,
+ uint32_t flags)
+{
+ struct hvm_ioreq_server *s;
+ int rc;
+
+ /* For now, only HVMMEM_ioreq_server is supported */
+ if ( type != HVMMEM_ioreq_server )
+ return -EINVAL;
+
+ if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ |
+ HVMOP_IOREQ_MEM_ACCESS_WRITE) )
+ return -EINVAL;
+
+ spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+ rc = -ENOENT;
+ list_for_each_entry ( s,
+ &d->arch.hvm_domain.ioreq_server.list,
+ list_entry )
+ {
+ if ( s == d->arch.hvm_domain.default_ioreq_server )
+ continue;
+
+ if ( s->id == id )
+ {
+ rc = p2m_set_ioreq_server(d, flags, s);
+ if ( rc == 0 )
+ gdprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n",
+ s->id, (flags != 0) ? "mapped to" : "unmapped from");
Why gdprintk()? I don't think the current domain is of much
interest here. What would be of interest is the subject domain.
s->id is not the domain_id, but id of the ioreq server.
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -132,6 +132,19 @@ static void ept_p2m_type_to_flags(struct p2m_domain
*p2m, ept_entry_t *entry,
entry->r = entry->w = entry->x = 1;
entry->a = entry->d = !!cpu_has_vmx_ept_ad;
break;
+ case p2m_ioreq_server:
+ entry->r = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_READ_ACCESS);
+ /*
+ * write access right is disabled when entry->r is 0, but whether
+ * write accesses are emulated by hypervisor or forwarded to an
+ * ioreq server depends on the setting of p2m->ioreq.flags.
+ */
+ entry->w = (entry->r &&
+ !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS));
+ entry->x = entry->r;
Why would we want to allow instruction execution from such pages?
And with all three bits now possibly being clear, aren't we risking the
entries to be mis-treated as not-present ones?
Hah. You got me. Thanks! :)
Now I realized it would be difficult if we wanna to emulate the read
operations for HVM. According to Intel mannual, entry->r is to be
cleared, so should entry->w if we do not want ept misconfig. And
with both read and write permissions being forbidden, entry->x can be
set only on processors with EXECUTE_ONLY capability.
To avoid any entry to be mis-treated as not-present. We have several
solutions:
a> do not support the read emulation for now - we have no such usage
case;
b> add the check of p2m_t against p2m_ioreq_server in is_epte_present -
a bit weird to me.
Which one do you prefer? or any other suggestions?
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -72,8 +72,8 @@ static const unsigned long pgt[] = {
PGT_l3_page_table
};
-static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
- unsigned int level)
+static unsigned long p2m_type_to_flags(struct p2m_domain *p2m, p2m_type_t t,
const
+int p2m_set_ioreq_server(struct domain *d,
+ unsigned long flags,
+ struct hvm_ioreq_server *s)
+{
+ struct p2m_domain *p2m = p2m_get_hostp2m(d);
+ int rc;
+
+ spin_lock(&p2m->ioreq.lock);
+
+ rc = -EBUSY;
+ if ( (flags != 0) && (p2m->ioreq.server != NULL) )
+ goto out;
+
+ rc = -EINVAL;
+ /* unmap ioreq server from p2m type by passing flags with 0 */
Comment style (also elsewhere).
+ if ( (flags == 0) && (p2m->ioreq.server != s) )
+ goto out;
The two flags checks above are redundant with ...
+ if ( flags == 0 )
+ {
+ p2m->ioreq.server = NULL;
+ p2m->ioreq.flags = 0;
+ }
+ else
+ {
+ p2m->ioreq.server = s;
+ p2m->ioreq.flags = flags;
+ }
... this - I think the earlier ones should be folded into this.
Ok. I'll have a try. :)
+ /*
+ * Each time we map/unmap an ioreq server to/from p2m_ioreq_server,
+ * we mark the p2m table to be recalculated, so that gfns which were
+ * previously marked with p2m_ioreq_server can be resynced.
+ */
+ p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
What does "resynced" here mean? I.e. I can see why this is wanted
when unmapping a server, but when mapping a server there shouldn't
be any such pages in the first place.
There shouldn't be. But if there is(misbehavior from the device model
side), it can be recalculated back to p2m_ram_rw(which is not quite
necessary as the unmapping case).
+ rc = 0;
+
+out:
Labels indented by at least one space please.
@@ -320,6 +321,27 @@ struct p2m_domain {
struct ept_data ept;
/* NPT-equivalent structure could be added here. */
};
+
+ struct {
+ spinlock_t lock;
+ /*
+ * ioreq server who's responsible for the emulation of
+ * gfns with specific p2m type(for now, p2m_ioreq_server).
+ * Behaviors of gfns with p2m_ioreq_server set but no
+ * ioreq server mapped in advance should be the same as
+ * p2m_ram_rw.
+ */
+ struct hvm_ioreq_server *server;
+ /*
+ * flags specifies whether read, write or both operations
+ * are to be emulated by an ioreq server.
+ */
+ unsigned long flags;
unsigned int
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -489,6 +489,43 @@ struct xen_hvm_altp2m_op {
typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
Instead of adding yet another such section, couldn't this be added
to an already existing one?
Yes.
+struct xen_hvm_map_mem_type_to_ioreq_server {
+ domid_t domid; /* IN - domain to be serviced */
+ ioservid_t id; /* IN - ioreq server id */
+ hvmmem_type_t type; /* IN - memory type */
You can't use this type for public interface structure fields - this
must be uintXX_t.
Got it.
Thanks
Yu
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel