Hi,

I was looking at using xen-hptool (tool/misc/xen-hptool.c) to make one page
of a guest domain offline.

I created a guest domain on Xen unstable:​
# xen-mfndump dump-p2m 1
I have dom1's mfn of pfn (0x1d):
pfn=0x1d ==> mfn=0x14ee17 (type 0x0)

​Run `lookup-pte` to find the mfn of the pte of mfn (0x14ee17)​:
# xen-mfndump lookup-pte 1 0x14ee17
 --- Lookig for PTEs mapping mfn 0x14ee17 for domain 1 ---
 Guest Width: 8, PT Levels: 4 P2M size: = 262144
  0x14ee17 <-- [0xd948e][29]: 0x1000014ee17027

​Now I use xen-hptool to make mfn (0x14ee17) offline​:
# xen-hptool mem-offline 0x14ee17
Prepare to offline MEMORY mfn 14ee17
DOM1: No suspend port, try live migration
Failed to suspend guest 1 for mfn 14ee17
​(Comment: I modified the code to bypass the suspension of the dom1. I
should use libxl to suspend dom1 or use the event channel to notify dom1 to
suspend as the original code does. But this is not the question/issue I'm
talking about here right now and I don't think this will affect the
following discussion/conclusion.)​
xc: error: Failure when submitting mmu updates: Internal error
xc: error: clear pte failed: Internal error
Memory mfn 14ee17 offlined successfully , this page is DOM1 page yet failed
to be exchanged. current state is [PG_OFFLINE_PENDING, PG_OFFLINE_OWNED]
(XEN) mm.c:2004:d0v0 Error pfn d948e: rd=ffff83015d446000,
od=ffff83017d8d0000
​​
, caf=8000000000000004, taf=1400000000000002
(XEN) mm.c:3544:d0v0 Could not get page for normal update

​I looked into the do_mmu_update() @ xen/arch/x86/mm.c, the reason why this
mmu_update fails is because the owner of the page table of mfn (0x14ee17),
denoted as pt_dom, is domain 0, while the owner of the page of mfn
(0x14ee17) is domain 1 in do_mmu_update().

After digging into it, I found the following code confused/suspicious:

Inside do_mmu_update() @ xen/arch/x86/mm.c,
pt_dom is assigned by the this line:   if ( (pt_dom = foreigndom >> 16 ) !=
0 ) .
However, in flush_mmu_updates() @ tools/libxc/xc_private.c, the foreigndom
is assigned by the following line: hypercall.arg[3] = mmu->subject; where
mmu->subject is the guest domain id of the page table.

The first question is:
Why should we use "foreigndom >> 16" instead of "foreigndom" to get the
pt_dom?
(When a page is marked offline, we can get the domid of the page via
status, using status >> PG_OFFLINE_OWNER_SHIFT. But why should we left
shift 16 bits again in do_mmu_update?)
(I think this explains why pt_owner is treated as 0 because pt_owner was
just using the default value which is the domain of current vcpu that runs
the hypercall.)

pt_owner is retrieved by the following line :
if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
My second question is:
Why should we use "pt_dom - 1" instead of  "pt_dom" here?

If I set the old foreigndom (1) as (foreigndom << 16 | foreigndom) and pass
the new foreigndom as the last parameter of do_mmu_update(), and change
"pt_dom - 1" to "pt_dom", the xen-hptool will successfully make the mfn
offline. Here is the output after issuing the command:Memory
mfn 0x14ee17 offlined successfully, this page is DOM1 page and being
swapped successfully, current state is [PG_OFFLINE_OFFLINED,
PG_OFFLINE_OWNED]

I'm wondering if this is a bug in do_mmu_update() or  at least some
inconsistence is in the do_mmu_update() code?
Of course, this could also be because I misunderstood something. If so,
could you please let me know what I misunderstood and how I should correct
it?

Thank you very much for your time!

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to