Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 10/3/18 1:56 PM, Сергей wrote: >> No yet, we're working on it. > Could You point me to the branch with Your patches please? I Could not find > it in https://xenbits.xen.org/gitweb/?p=xen.git I've attached the current version of the patch, which needs George's comments addressed (so we already know it needs more work) - but it does fix the frozen display issue 99% of the time. You need to apply "[PATCH V3] x86/altp2m: propagate ept.ad changes to all active altp2ms" first. Razvan diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 3d228c2..7317e50 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -657,6 +657,9 @@ bool_t ept_handle_misconfig(uint64_t gpa) bool_t spurious; int rc; +if ( altp2m_active(curr->domain) ) +p2m = p2m_get_altp2m(curr); + p2m_lock(p2m); spurious = curr->arch.hvm.vmx.ept_spurious_misconfig; @@ -1436,6 +1439,11 @@ void p2m_init_altp2m_ept(struct domain *d, unsigned int i) struct ept_data *ept; p2m->ept.ad = hostp2m->ept.ad; +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; +p2m->default_access = hostp2m->default_access; +p2m->domain = hostp2m->domain; +p2m->logdirty_ranges = hostp2m->logdirty_ranges; +p2m->global_logdirty = hostp2m->global_logdirty; p2m->min_remapped_gfn = gfn_x(INVALID_GFN); p2m->max_remapped_gfn = 0; ept = &p2m->ept; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d90c624..2d59295 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -255,7 +256,6 @@ int p2m_init(struct domain *d) int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start, unsigned long end) { -ASSERT(p2m_is_hostp2m(p2m)); if ( p2m->global_logdirty || rangeset_contains_range(p2m->logdirty_ranges, start, end) ) return 1; @@ -264,11 +264,9 @@ int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start, return 0; } -void p2m_change_entry_type_global(struct domain *d, - p2m_type_t ot, p2m_type_t nt) +static void _p2m_change_entry_type_global(struct p2m_domain *p2m, + p2m_type_t ot, p2m_type_t nt) { -struct p2m_domain *p2m = p2m_get_hostp2m(d); - ASSERT(ot != nt); ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt)); @@ -278,10 +276,25 @@ void p2m_change_entry_type_global(struct domain *d, p2m_unlock(p2m); } -void p2m_memory_type_changed(struct domain *d) +void p2m_change_entry_type_global(struct domain *d, + p2m_type_t ot, p2m_type_t nt) { -struct p2m_domain *p2m = p2m_get_hostp2m(d); +#ifdef CONFIG_HVM +if ( unlikely(altp2m_active(d)) ) +{ +unsigned int i; +for ( i = 0; i < MAX_ALTP2M; i++ ) +if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) ) +_p2m_change_entry_type_global(d->arch.altp2m_p2m[i], ot, nt); +} +#endif + +_p2m_change_entry_type_global(p2m_get_hostp2m(d), ot, nt); +} + +void _p2m_memory_type_changed(struct p2m_domain *p2m) +{ if ( p2m->memory_type_changed ) { p2m_lock(p2m); @@ -290,6 +303,22 @@ void p2m_memory_type_changed(struct domain *d) } } +void p2m_memory_type_changed(struct domain *d) +{ +#ifdef CONFIG_HVM +if ( unlikely(altp2m_active(d)) ) +{ +unsigned int i; + +for ( i = 0; i < MAX_ALTP2M; i++ ) +if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) ) +_p2m_memory_type_changed(d->arch.altp2m_p2m[i]); +} +#endif + +_p2m_memory_type_changed(p2m_get_hostp2m(d)); +} + int p2m_set_ioreq_server(struct domain *d, unsigned int flags, struct hvm_ioreq_server *s) @@ -967,12 +996,12 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l, } /* Modify the p2m type of a range of gfns from ot to nt. */ -void p2m_change_type_range(struct domain *d, - unsigned long start, unsigned long end, - p2m_type_t ot, p2m_type_t nt) +static void _p2m_change_type_range(struct p2m_domain *p2m, + unsigned long start, unsigned long end, + p2m_type_t ot, p2m_type_t nt) { +struct domain *d = p2m->domain; unsigned long gfn = start; -struct p2m_domain *p2m = p2m_get_hostp2m(d); int rc = 0; ASSERT(ot != nt); @@ -1025,6 +1054,25 @@ void p2m_change_type_range(struct domain *d, p2m_unlock(p2m); } +void p2m_change_type_range(struct domain *d, + unsigned long start, unsigned long end, + p2m_type_t ot, p2m_type_t nt) +{ +#ifdef CONFIG_HVM +if ( unlikely(altp2m_active(d)) ) +{ +unsigned int i; + +for ( i = 0; i < MAX_A
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 10/3/18 1:56 PM, Сергей wrote: >> No yet, we're working on it. > Could You point me to the branch with Your patches please? I Could not find > it in https://xenbits.xen.org/gitweb/?p=xen.git There's no public branch with my patches, I'm working locally. The original patch has now split into two patches, the first one of which is currently "[PATCH V3] x86/altp2m: propagate ept.ad changes to all active altp2ms" (pending review on xen-devel), and the second one depends on the final form of the first one and is not yet completely written. Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
> No yet, we're working on it. Could You point me to the branch with Your patches please? I Could not find it in https://xenbits.xen.org/gitweb/?p=xen.git With best regards Sergey Kovalev. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Weird altp2m behaviour when switching early to a new view
On 10/2/18 7:29 PM, Сергей wrote: > Hello Razvan. > > Have Your patch been accepted in Xen hypervisor? > > Searching through git I have found commit > "61bdddb82151fbf51c58f6ebc1b4a687942c45a8" on "Thu Jun 28 10:54:01 2018 > +0300". Is that commit deals with the error? No yet, we're working on it. The commit you mention deals with something else. Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
Hello Razvan. Have Your patch been accepted in Xen hypervisor? Searching through git I have found commit "61bdddb82151fbf51c58f6ebc1b4a687942c45a8" on "Thu Jun 28 10:54:01 2018 +0300". Is that commit deals with the error? With best regards Sergey Kovalev. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/18/2018 01:45 PM, George Dunlap wrote: > On 04/18/2018 09:20 AM, Razvan Cojocaru wrote: >> On 04/17/2018 04:53 PM, George Dunlap wrote: >>> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: void p2m_init_altp2m_ept(struct domain *d, unsigned int i) { struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); struct ept_data *ept; +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; +p2m->default_access = hostp2m->default_access; +p2m->domain = hostp2m->domain; +p2m->logdirty_ranges = hostp2m->logdirty_ranges; +p2m->global_logdirty = hostp2m->global_logdirty; >>> >>> This would certainly be one approach. But then we'd need to keep a lot >>> more of these things in sync -- for instance, we'd have to have similar >>> code to enable and disable global_logdirty on all active altp2m entries. >>> >>> [...] >>> >>> The other thing that seems to be missing from synchronization is that in >>> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being >>> part of the eptp). The code seems to indicate that this is required for >>> PML (hardware-assisted logdirty), but I don't see anywhere this is set >>> or copied from the host p2m. >>> >>> It might be nice to have a more structured way of keeping all these >>> changes in sync, rather than relying on this open-coding everywhere. >> >> For logdirty_ranges and global_logdirty, I propose that we put them both >> in a dynamically allocated struct, and have all p2ms share a pointer to >> them. That way, all that's required is for the pointer to be set up in >> p2m_init_altp2m_ept() and the actual data will be automatically shared >> between p2ms. If I've read the code correctly, the hostp2m is the last >> to be destroyed and the first to be initialized, so it should work well >> as long as all p2ms synchronize access to logdirty_ranges and >> global_logdirty (which I assume they already do). > > That's an interesting idea; one potential disadvantage is that it would > make locking even more complex than it already is. Enabling / disabling > logdirty isn't a hot path, so looping through the altp2ms shouldn't have > much of a performance impact; *reading* is much more common, so having > to grab an extra set of locks is more likely to have a performance > impact. And it's not clear to me that the complexity of keeping several > copies in sync is that much higher than the complexity of adding Yet > Another MM Lock. > > Those are just initial reactions though -- feel free to explore the > solution space and/or argue otherwise. :-) No, you're right, there's no point in complicating things, I'll just loop over the p2ms. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/18/2018 09:20 AM, Razvan Cojocaru wrote: > On 04/17/2018 04:53 PM, George Dunlap wrote: >> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: >>> void p2m_init_altp2m_ept(struct domain *d, unsigned int i) >>> { >>> struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; >>> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); >>> struct ept_data *ept; >>> >>> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; >>> +p2m->default_access = hostp2m->default_access; >>> +p2m->domain = hostp2m->domain; >>> +p2m->logdirty_ranges = hostp2m->logdirty_ranges; >>> +p2m->global_logdirty = hostp2m->global_logdirty; >> >> This would certainly be one approach. But then we'd need to keep a lot >> more of these things in sync -- for instance, we'd have to have similar >> code to enable and disable global_logdirty on all active altp2m entries. >> >> [...] >> >> The other thing that seems to be missing from synchronization is that in >> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being >> part of the eptp). The code seems to indicate that this is required for >> PML (hardware-assisted logdirty), but I don't see anywhere this is set >> or copied from the host p2m. >> >> It might be nice to have a more structured way of keeping all these >> changes in sync, rather than relying on this open-coding everywhere. > > For logdirty_ranges and global_logdirty, I propose that we put them both > in a dynamically allocated struct, and have all p2ms share a pointer to > them. That way, all that's required is for the pointer to be set up in > p2m_init_altp2m_ept() and the actual data will be automatically shared > between p2ms. If I've read the code correctly, the hostp2m is the last > to be destroyed and the first to be initialized, so it should work well > as long as all p2ms synchronize access to logdirty_ranges and > global_logdirty (which I assume they already do). That's an interesting idea; one potential disadvantage is that it would make locking even more complex than it already is. Enabling / disabling logdirty isn't a hot path, so looping through the altp2ms shouldn't have much of a performance impact; *reading* is much more common, so having to grab an extra set of locks is more likely to have a performance impact. And it's not clear to me that the complexity of keeping several copies in sync is that much higher than the complexity of adding Yet Another MM Lock. Those are just initial reactions though -- feel free to explore the solution space and/or argue otherwise. :-) -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 04:53 PM, George Dunlap wrote: > On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: >> void p2m_init_altp2m_ept(struct domain *d, unsigned int i) >> { >> struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; >> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); >> struct ept_data *ept; >> >> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; >> +p2m->default_access = hostp2m->default_access; >> +p2m->domain = hostp2m->domain; >> +p2m->logdirty_ranges = hostp2m->logdirty_ranges; >> +p2m->global_logdirty = hostp2m->global_logdirty; > > This would certainly be one approach. But then we'd need to keep a lot > more of these things in sync -- for instance, we'd have to have similar > code to enable and disable global_logdirty on all active altp2m entries. > > [...] > > The other thing that seems to be missing from synchronization is that in > p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being > part of the eptp). The code seems to indicate that this is required for > PML (hardware-assisted logdirty), but I don't see anywhere this is set > or copied from the host p2m. > > It might be nice to have a more structured way of keeping all these > changes in sync, rather than relying on this open-coding everywhere. For logdirty_ranges and global_logdirty, I propose that we put them both in a dynamically allocated struct, and have all p2ms share a pointer to them. That way, all that's required is for the pointer to be set up in p2m_init_altp2m_ept() and the actual data will be automatically shared between p2ms. If I've read the code correctly, the hostp2m is the last to be destroyed and the first to be initialized, so it should work well as long as all p2ms synchronize access to logdirty_ranges and global_logdirty (which I assume they already do). ept.ad probably needs to be propagated on each change and copied from the hostp2m on altp2m init. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On Tue, Apr 17, 2018 at 9:13 AM, Razvan Cojocaru wrote: > On 04/17/2018 05:58 PM, George Dunlap wrote: It might be nice to have a more structured way of keeping all these changes in sync, rather than relying on this open-coding everywhere. >>> >>> Very true. It has also occured to me that some of these issues would be >>> at least partially mitigated if altp2m was always on, but of course I >>> can also see why that would be frowned upon at this time. >> >> How would having it enabled all the time have helped in this situation? > > Well, the default p2m would just be > d->arch.altp2m_p2m[0], all altp2m_active(d) checks would be gone, the > active p2m for a given VCPU would always be p2m_get_altp2m(v), and so > on. If I understand the design correctly, that is. :) I would also like this setup, it would simplify things a lot. Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 05:58 PM, George Dunlap wrote: >>> It might be nice to have a more structured way of keeping all these >>> changes in sync, rather than relying on this open-coding everywhere. >> >> Very true. It has also occured to me that some of these issues would be >> at least partially mitigated if altp2m was always on, but of course I >> can also see why that would be frowned upon at this time. > > How would having it enabled all the time have helped in this situation? Well, the default p2m would just be d->arch.altp2m_p2m[0], all altp2m_active(d) checks would be gone, the active p2m for a given VCPU would always be p2m_get_altp2m(v), and so on. If I understand the design correctly, that is. :) Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 03:21 PM, Razvan Cojocaru wrote: > On 04/17/2018 04:53 PM, George Dunlap wrote: >> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: >>> void p2m_init_altp2m_ept(struct domain *d, unsigned int i) >>> { >>> struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; >>> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); >>> struct ept_data *ept; >>> >>> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; >>> +p2m->default_access = hostp2m->default_access; >>> +p2m->domain = hostp2m->domain; >>> +p2m->logdirty_ranges = hostp2m->logdirty_ranges; >>> +p2m->global_logdirty = hostp2m->global_logdirty; >> >> This would certainly be one approach. But then we'd need to keep a lot >> more of these things in sync -- for instance, we'd have to have similar >> code to enable and disable global_logdirty on all active altp2m entries. >> >> I also don't think the max_mapped_pfn should be copied here; the fact >> that updates got filtered out before was a red herring I think. > > I initially thought so too, and now I've commented out just that one > line to remember why I couldn't remove, and the reason is this: > > (XEN) Assertion 's <= e' failed at rangeset.c:121 > (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] > (XEN) CPU:0 > (XEN) RIP:e008:[] rangeset_add_range+0x5/0x1e6 > (XEN) RFLAGS: 00010206 CONTEXT: hypervisor (d0v0) > (XEN) rax: rbx: 830b577f92e0 rcx: 000f > (XEN) rdx: rsi: 000f rdi: 830ad6a1ce50 > (XEN) rbp: 83007ce87c78 rsp: 83007ce87c20 r8: > (XEN) r9: r10: 006f r11: 0028 > (XEN) r12: 0002 r13: r14: 0001 > (XEN) r15: 830ad6ddd000 cr0: 80050033 cr4: 003526e0 > (XEN) cr3: 000ad714f000 cr2: 00c12000 > (XEN) fsb: 7f794c7b2700 gsb: 880276c0 gss: > (XEN) ds: es: fs: gs: ss: e010 cs: e008 > (XEN) Xen code around (rangeset_add_range+0x5/0x1e6): > (XEN) 00 5d c3 48 39 d6 76 02 <0f> 0b 55 48 89 e5 41 56 41 55 41 54 53 > 49 89 d5 > (XEN) Xen stack trace from rsp=83007ce87c20: > (XEN)82d08032c644 0206 0010 0028 > (XEN)000f 82c000217000 830ad6ddd000 > (XEN)000f0240 0002 83007ce87cc8 > (XEN)82d08032c7ac 0240 000f 82c000217000 > (XEN)830ad6ddd000 000f0240 0048 82c000217000 > (XEN)000f 83007ce87d38 82d080362ade 830c5bb2 > (XEN)830ad6ddd650 7f794c7bd004 > (XEN)0048 830c5bb2 83007ce87e00 > (XEN)ffea 82d0802e9c64 deadbeefdeadf00d 83007ce87de8 > (XEN)82d0802e92c1 830c5bb2 83007d616000 > (XEN)83007ce87dd8 83007ce87df8 83007ce87df4 > (XEN)82e016a5de40 83007d616000 0007 0240 > (XEN)000f 83007ce87dc8 830ad6ddd000 83007ce87dc8 > (XEN)0002 0001 7f794c7bf004 82d0802e9c64 > (XEN)deadbeefdeadf00d 83007ce87e48 82d0802e9ce1 82d080374434 > (XEN)000280370001 7f794c7be004 0020 7f794c7bd004 > (XEN)0048 82d080374434 83007ce87ef8 83007d616000 > (XEN)0029 83007ce87ee8 82d08036d8a6 03ff82d080374434 > (XEN)0001 0002 7f794c7bf004 deadbeefdeadf00d > (XEN)deadbeefdeadf00d 82d080374434 82d080374428 82d080374434 > (XEN) Xen call trace: > (XEN)[] rangeset_add_range+0x5/0x1e6 > (XEN)[] p2m_change_type_range+0x66/0x7f > (XEN)[] hap_track_dirty_vram+0x240/0x491 > (XEN)[] dm.c#dm_op+0x45c/0xd06 > (XEN)[] do_dm_op+0x7d/0xb3 > (XEN)[] pv_hypercall+0x1f4/0x440 > (XEN)[] lstar_enter+0x115/0x120 > (XEN) > (XEN) > (XEN) > (XEN) Panic on CPU 0: > (XEN) Assertion 's <= e' failed at rangeset.c:121 > (XEN) Right, but this is basically a bug in p2m_change_type_range(), where it handles end > max_mapped_pfn, but not start > max_mapped_pfn. It should check the latter just after grabbing the lock and bail if true. (This should probably be in a separate patch, as it's really a generic bug in p2m_change_type_range().) >> Another approach would be to maintain the logdirty_ranges and >> global_logdirty only for the host p2m, but to misconfigure entries for >> all the p2ms; and then on a misconfiguration, handle the >> misconfiguration for the hostp2m and then do a lazy propagate for the >> altp2m. On the whole that's probably more error-prone than ju
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 04:53 PM, George Dunlap wrote: > On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: >> void p2m_init_altp2m_ept(struct domain *d, unsigned int i) >> { >> struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; >> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); >> struct ept_data *ept; >> >> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; >> +p2m->default_access = hostp2m->default_access; >> +p2m->domain = hostp2m->domain; >> +p2m->logdirty_ranges = hostp2m->logdirty_ranges; >> +p2m->global_logdirty = hostp2m->global_logdirty; > > This would certainly be one approach. But then we'd need to keep a lot > more of these things in sync -- for instance, we'd have to have similar > code to enable and disable global_logdirty on all active altp2m entries. > > I also don't think the max_mapped_pfn should be copied here; the fact > that updates got filtered out before was a red herring I think. I initially thought so too, and now I've commented out just that one line to remember why I couldn't remove, and the reason is this: (XEN) Assertion 's <= e' failed at rangeset.c:121 (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:e008:[] rangeset_add_range+0x5/0x1e6 (XEN) RFLAGS: 00010206 CONTEXT: hypervisor (d0v0) (XEN) rax: rbx: 830b577f92e0 rcx: 000f (XEN) rdx: rsi: 000f rdi: 830ad6a1ce50 (XEN) rbp: 83007ce87c78 rsp: 83007ce87c20 r8: (XEN) r9: r10: 006f r11: 0028 (XEN) r12: 0002 r13: r14: 0001 (XEN) r15: 830ad6ddd000 cr0: 80050033 cr4: 003526e0 (XEN) cr3: 000ad714f000 cr2: 00c12000 (XEN) fsb: 7f794c7b2700 gsb: 880276c0 gss: (XEN) ds: es: fs: gs: ss: e010 cs: e008 (XEN) Xen code around (rangeset_add_range+0x5/0x1e6): (XEN) 00 5d c3 48 39 d6 76 02 <0f> 0b 55 48 89 e5 41 56 41 55 41 54 53 49 89 d5 (XEN) Xen stack trace from rsp=83007ce87c20: (XEN)82d08032c644 0206 0010 0028 (XEN)000f 82c000217000 830ad6ddd000 (XEN)000f0240 0002 83007ce87cc8 (XEN)82d08032c7ac 0240 000f 82c000217000 (XEN)830ad6ddd000 000f0240 0048 82c000217000 (XEN)000f 83007ce87d38 82d080362ade 830c5bb2 (XEN)830ad6ddd650 7f794c7bd004 (XEN)0048 830c5bb2 83007ce87e00 (XEN)ffea 82d0802e9c64 deadbeefdeadf00d 83007ce87de8 (XEN)82d0802e92c1 830c5bb2 83007d616000 (XEN)83007ce87dd8 83007ce87df8 83007ce87df4 (XEN)82e016a5de40 83007d616000 0007 0240 (XEN)000f 83007ce87dc8 830ad6ddd000 83007ce87dc8 (XEN)0002 0001 7f794c7bf004 82d0802e9c64 (XEN)deadbeefdeadf00d 83007ce87e48 82d0802e9ce1 82d080374434 (XEN)000280370001 7f794c7be004 0020 7f794c7bd004 (XEN)0048 82d080374434 83007ce87ef8 83007d616000 (XEN)0029 83007ce87ee8 82d08036d8a6 03ff82d080374434 (XEN)0001 0002 7f794c7bf004 deadbeefdeadf00d (XEN)deadbeefdeadf00d 82d080374434 82d080374428 82d080374434 (XEN) Xen call trace: (XEN)[] rangeset_add_range+0x5/0x1e6 (XEN)[] p2m_change_type_range+0x66/0x7f (XEN)[] hap_track_dirty_vram+0x240/0x491 (XEN)[] dm.c#dm_op+0x45c/0xd06 (XEN)[] do_dm_op+0x7d/0xb3 (XEN)[] pv_hypercall+0x1f4/0x440 (XEN)[] lstar_enter+0x115/0x120 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 's <= e' failed at rangeset.c:121 (XEN) > Another approach would be to maintain the logdirty_ranges and > global_logdirty only for the host p2m, but to misconfigure entries for > all the p2ms; and then on a misconfiguration, handle the > misconfiguration for the hostp2m and then do a lazy propagate for the > altp2m. On the whole that's probably more error-prone than just doing a > for() loop, though, and not that much faster. :-) We can try that too. > The other thing that seems to be missing from synchronization is that in > p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being > part of the eptp). The code seems to indicate that this is required for > PML (hardware-assisted logdirty), but I don't see anywhere this is set > or copied from the host p2m. > > It might be nice to have a more structured way of keeping all these > changes i
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: > void p2m_init_altp2m_ept(struct domain *d, unsigned int i) > { > struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; > +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); > struct ept_data *ept; > > +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; > +p2m->default_access = hostp2m->default_access; > +p2m->domain = hostp2m->domain; > +p2m->logdirty_ranges = hostp2m->logdirty_ranges; > +p2m->global_logdirty = hostp2m->global_logdirty; This would certainly be one approach. But then we'd need to keep a lot more of these things in sync -- for instance, we'd have to have similar code to enable and disable global_logdirty on all active altp2m entries. I also don't think the max_mapped_pfn should be copied here; the fact that updates got filtered out before was a red herring I think. Another approach would be to maintain the logdirty_ranges and global_logdirty only for the host p2m, but to misconfigure entries for all the p2ms; and then on a misconfiguration, handle the misconfiguration for the hostp2m and then do a lazy propagate for the altp2m. On the whole that's probably more error-prone than just doing a for() loop, though, and not that much faster. :-) The other thing that seems to be missing from synchronization is that in p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being part of the eptp). The code seems to indicate that this is required for PML (hardware-assisted logdirty), but I don't see anywhere this is set or copied from the host p2m. It might be nice to have a more structured way of keeping all these changes in sync, rather than relying on this open-coding everywhere. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 01:49 PM, Razvan Cojocaru wrote: > On 04/17/2018 11:24 AM, Razvan Cojocaru wrote: >> On 04/16/2018 11:21 PM, George Dunlap wrote: >>> On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru >>> wrote: On 04/16/2018 08:47 PM, George Dunlap wrote: > On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: >> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >>> Debugging continues. >> >> Finally, the attached patch seems to get the display unstuck in my >> scenario, although for one guest I get: >> >> (XEN) d2v0 Unexpected vmexit: reason 49 >> (XEN) domain_crash called from vmx.c:4120 >> (XEN) Domain 2 (vcpu#0) crashed on cpu#1: >> (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] >> (XEN) CPU:1 >> (XEN) RIP:0010:[] >> (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) >> (XEN) rax: f8800300 rbx: f900c0083db0 rcx: >> aa55aa55 >> (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: >> 0001 >> (XEN) rbp: rsp: f88002ee9ef0 r8: >> fa80041bdc40 >> (XEN) r9: f80001810e80 r10: fa800342aa70 r11: >> f88002ee9e80 >> (XEN) r12: 0005 r13: 0001 r14: >> f900c00c08b0 >> (XEN) r15: 0001 cr0: 80050031 cr4: >> 000406f8 >> (XEN) cr3: ef771000 cr2: f900c00c8000 >> (XEN) fsb: fffde000 gsb: f80001810d00 gss: >> 07fdc000 >> (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 >> >> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping >> somebody more familiar with the code can point to a more elegant >> solution if one exists. > > I think I have an idea what's going on, but it's complicated. :-) > > Basically, the logdirty functionality isn't simple, and needs careful > thought on how to integrate it. I'll write some more tomorrow, and see > if I can come up with a solution. I think I know why this happens for the one guest - the other guests start at a certain resolution display-wise and stay that way until shutdown. This particular guest starts with a larger screen, then goes to roughly 2/3rds of it, then tries to go back to the initial larger one - at which point the above happens. I assume this corresponds to some pages being removed and/or added. I'll test this theory more tomorrow - if it's correct I should be able to reproduce the crash (with the patch) by simply resetting the screen resolution (increasing it). >>> >>> The trick is that p2m_change_type doesn't actually iterate over the >>> entire p2m range, individually changing entries as it goes. Instead >>> it misconfigures the entries at the top-level, which causes the kinds >>> of faults shown above. As it gets faults for each entry, it checks >>> the current type, the logdirty ranges, and the global logdirty bit to >>> determine what the new types should be. >>> >>> Your patch makes it so that all the altp2ms now get the >>> misconfiguration when the logdirty range is changed; but clearly >>> handling the misconfiguration isn't integrated properly with the >>> altp2m system yet. Doing it right may take some thought. >> >> FWIW, the attached patch has solved the misconfig-related domain crash >> for me (though I'm very likely missing some subtleties). It all seems to >> work as expected when enabling altp2m and switching early to a new view. >> However, now I have domUs with a frozen display when I disconnect the >> introspection application (that is, after I switch back to the default >> view and disable altp2m on the domain). > > The for() loop in the previous patch is unnecessary, so here's a new > (cleaner) patch. I can't get the guest to freeze the display when > detaching anymore - unrelated to the for() - (so it might have been > something else in my setup), but I'll watch for it in the following days. > > Hopefully this is either a reasonable fix or a basis for one. Apologies, forgot to actually attach the patch. Here it is. diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 14b5939..8495039 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -17,6 +17,7 @@ #include #include +#include #include #include #include @@ -656,6 +657,9 @@ bool_t ept_handle_misconfig(uint64_t gpa) bool_t spurious; int rc; +if ( altp2m_active(curr->domain) ) +p2m = p2m_get_altp2m(curr); + p2m_lock(p2m); spurious = curr->arch.hvm_vmx.ept_spurious_misconfig; @@ -1375,8 +1379,15 @@ void setup_ept_dump(void) void p2m_init_altp2m_ept(struct domain *d, unsigned int i) { struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); struct ept_data *ept; +p2m->max_mapp
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 11:24 AM, Razvan Cojocaru wrote: > On 04/16/2018 11:21 PM, George Dunlap wrote: >> On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru >> wrote: >>> On 04/16/2018 08:47 PM, George Dunlap wrote: On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: > On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >> Debugging continues. > > Finally, the attached patch seems to get the display unstuck in my > scenario, although for one guest I get: > > (XEN) d2v0 Unexpected vmexit: reason 49 > (XEN) domain_crash called from vmx.c:4120 > (XEN) Domain 2 (vcpu#0) crashed on cpu#1: > (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] > (XEN) CPU:1 > (XEN) RIP:0010:[] > (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) > (XEN) rax: f8800300 rbx: f900c0083db0 rcx: > aa55aa55 > (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: > 0001 > (XEN) rbp: rsp: f88002ee9ef0 r8: > fa80041bdc40 > (XEN) r9: f80001810e80 r10: fa800342aa70 r11: > f88002ee9e80 > (XEN) r12: 0005 r13: 0001 r14: > f900c00c08b0 > (XEN) r15: 0001 cr0: 80050031 cr4: > 000406f8 > (XEN) cr3: ef771000 cr2: f900c00c8000 > (XEN) fsb: fffde000 gsb: f80001810d00 gss: > 07fdc000 > (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 > > i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping > somebody more familiar with the code can point to a more elegant > solution if one exists. I think I have an idea what's going on, but it's complicated. :-) Basically, the logdirty functionality isn't simple, and needs careful thought on how to integrate it. I'll write some more tomorrow, and see if I can come up with a solution. >>> >>> I think I know why this happens for the one guest - the other guests >>> start at a certain resolution display-wise and stay that way until shutdown. >>> >>> This particular guest starts with a larger screen, then goes to roughly >>> 2/3rds of it, then tries to go back to the initial larger one - at which >>> point the above happens. I assume this corresponds to some pages being >>> removed and/or added. I'll test this theory more tomorrow - if it's >>> correct I should be able to reproduce the crash (with the patch) by >>> simply resetting the screen resolution (increasing it). >> >> The trick is that p2m_change_type doesn't actually iterate over the >> entire p2m range, individually changing entries as it goes. Instead >> it misconfigures the entries at the top-level, which causes the kinds >> of faults shown above. As it gets faults for each entry, it checks >> the current type, the logdirty ranges, and the global logdirty bit to >> determine what the new types should be. >> >> Your patch makes it so that all the altp2ms now get the >> misconfiguration when the logdirty range is changed; but clearly >> handling the misconfiguration isn't integrated properly with the >> altp2m system yet. Doing it right may take some thought. > > FWIW, the attached patch has solved the misconfig-related domain crash > for me (though I'm very likely missing some subtleties). It all seems to > work as expected when enabling altp2m and switching early to a new view. > However, now I have domUs with a frozen display when I disconnect the > introspection application (that is, after I switch back to the default > view and disable altp2m on the domain). The for() loop in the previous patch is unnecessary, so here's a new (cleaner) patch. I can't get the guest to freeze the display when detaching anymore - unrelated to the for() - (so it might have been something else in my setup), but I'll watch for it in the following days. Hopefully this is either a reasonable fix or a basis for one. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/16/2018 11:21 PM, George Dunlap wrote: > On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru > wrote: >> On 04/16/2018 08:47 PM, George Dunlap wrote: >>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: > Debugging continues. Finally, the attached patch seems to get the display unstuck in my scenario, although for one guest I get: (XEN) d2v0 Unexpected vmexit: reason 49 (XEN) domain_crash called from vmx.c:4120 (XEN) Domain 2 (vcpu#0) crashed on cpu#1: (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:0010:[] (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 (XEN) cr3: ef771000 cr2: f900c00c8000 (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping somebody more familiar with the code can point to a more elegant solution if one exists. >>> >>> I think I have an idea what's going on, but it's complicated. :-) >>> >>> Basically, the logdirty functionality isn't simple, and needs careful >>> thought on how to integrate it. I'll write some more tomorrow, and see >>> if I can come up with a solution. >> >> I think I know why this happens for the one guest - the other guests >> start at a certain resolution display-wise and stay that way until shutdown. >> >> This particular guest starts with a larger screen, then goes to roughly >> 2/3rds of it, then tries to go back to the initial larger one - at which >> point the above happens. I assume this corresponds to some pages being >> removed and/or added. I'll test this theory more tomorrow - if it's >> correct I should be able to reproduce the crash (with the patch) by >> simply resetting the screen resolution (increasing it). > > The trick is that p2m_change_type doesn't actually iterate over the > entire p2m range, individually changing entries as it goes. Instead > it misconfigures the entries at the top-level, which causes the kinds > of faults shown above. As it gets faults for each entry, it checks > the current type, the logdirty ranges, and the global logdirty bit to > determine what the new types should be. > > Your patch makes it so that all the altp2ms now get the > misconfiguration when the logdirty range is changed; but clearly > handling the misconfiguration isn't integrated properly with the > altp2m system yet. Doing it right may take some thought. FWIW, the attached patch has solved the misconfig-related domain crash for me (though I'm very likely missing some subtleties). It all seems to work as expected when enabling altp2m and switching early to a new view. However, now I have domUs with a frozen display when I disconnect the introspection application (that is, after I switch back to the default view and disable altp2m on the domain). Thanks, Razvan diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 14b5939..4530689 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -17,6 +17,7 @@ #include #include +#include #include #include #include @@ -652,17 +653,38 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn) bool_t ept_handle_misconfig(uint64_t gpa) { struct vcpu *curr = current; -struct p2m_domain *p2m = p2m_get_hostp2m(curr->domain); +struct domain *d = curr->domain; +struct p2m_domain *p2m = p2m_get_hostp2m(d); bool_t spurious; -int rc; - -p2m_lock(p2m); +int rc = 0; +unsigned int i; spurious = curr->arch.hvm_vmx.ept_spurious_misconfig; -rc = resolve_misconfig(p2m, PFN_DOWN(gpa)); -curr->arch.hvm_vmx.ept_spurious_misconfig = 0; -p2m_unlock(p2m); +if ( altp2m_active(d) ) +{ + for ( i = 0; i < MAX_ALTP2M; i++ ) +if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) ) +{ +p2m = d->arch.altp2m_p2m[i]; + +p2m_lock(p2m); + +rc = resolve_misconfig(p2m, PFN_DOWN(gpa)); +curr->arch.hvm_vmx.ept_spurious_misconfig = 0; + +p2m_unlock(p2m); +} +} +else +{ +p2m_lock(p2m); + +rc = resolve_misconfig(p2m, PFN_DOWN(gpa)); +curr->arch.hvm_vmx.ept_spurious_misconfig = 0; + +p2m_unlock(p2m); +}
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/16/2018 11:21 PM, George Dunlap wrote: > On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru > wrote: >> On 04/16/2018 08:47 PM, George Dunlap wrote: >>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: > Debugging continues. Finally, the attached patch seems to get the display unstuck in my scenario, although for one guest I get: (XEN) d2v0 Unexpected vmexit: reason 49 (XEN) domain_crash called from vmx.c:4120 (XEN) Domain 2 (vcpu#0) crashed on cpu#1: (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:0010:[] (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 (XEN) cr3: ef771000 cr2: f900c00c8000 (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping somebody more familiar with the code can point to a more elegant solution if one exists. >>> >>> I think I have an idea what's going on, but it's complicated. :-) >>> >>> Basically, the logdirty functionality isn't simple, and needs careful >>> thought on how to integrate it. I'll write some more tomorrow, and see >>> if I can come up with a solution. >> >> I think I know why this happens for the one guest - the other guests >> start at a certain resolution display-wise and stay that way until shutdown. >> >> This particular guest starts with a larger screen, then goes to roughly >> 2/3rds of it, then tries to go back to the initial larger one - at which >> point the above happens. I assume this corresponds to some pages being >> removed and/or added. I'll test this theory more tomorrow - if it's >> correct I should be able to reproduce the crash (with the patch) by >> simply resetting the screen resolution (increasing it). > > The trick is that p2m_change_type doesn't actually iterate over the > entire p2m range, individually changing entries as it goes. Instead > it misconfigures the entries at the top-level, which causes the kinds > of faults shown above. As it gets faults for each entry, it checks > the current type, the logdirty ranges, and the global logdirty bit to > determine what the new types should be. > > Your patch makes it so that all the altp2ms now get the > misconfiguration when the logdirty range is changed; but clearly > handling the misconfiguration isn't integrated properly with the > altp2m system yet. Doing it right may take some thought. Thanks for investigating! Sure enough, my suspicion was correct - all it takes to get that domain crash is to change the VM screen resolution. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru wrote: > On 04/16/2018 08:47 PM, George Dunlap wrote: >> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: >>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: Debugging continues. >>> >>> Finally, the attached patch seems to get the display unstuck in my >>> scenario, although for one guest I get: >>> >>> (XEN) d2v0 Unexpected vmexit: reason 49 >>> (XEN) domain_crash called from vmx.c:4120 >>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1: >>> (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] >>> (XEN) CPU:1 >>> (XEN) RIP:0010:[] >>> (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) >>> (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 >>> (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 >>> (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 >>> (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 >>> (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 >>> (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 >>> (XEN) cr3: ef771000 cr2: f900c00c8000 >>> (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 >>> (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 >>> >>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping >>> somebody more familiar with the code can point to a more elegant >>> solution if one exists. >> >> I think I have an idea what's going on, but it's complicated. :-) >> >> Basically, the logdirty functionality isn't simple, and needs careful >> thought on how to integrate it. I'll write some more tomorrow, and see >> if I can come up with a solution. > > I think I know why this happens for the one guest - the other guests > start at a certain resolution display-wise and stay that way until shutdown. > > This particular guest starts with a larger screen, then goes to roughly > 2/3rds of it, then tries to go back to the initial larger one - at which > point the above happens. I assume this corresponds to some pages being > removed and/or added. I'll test this theory more tomorrow - if it's > correct I should be able to reproduce the crash (with the patch) by > simply resetting the screen resolution (increasing it). The trick is that p2m_change_type doesn't actually iterate over the entire p2m range, individually changing entries as it goes. Instead it misconfigures the entries at the top-level, which causes the kinds of faults shown above. As it gets faults for each entry, it checks the current type, the logdirty ranges, and the global logdirty bit to determine what the new types should be. Your patch makes it so that all the altp2ms now get the misconfiguration when the logdirty range is changed; but clearly handling the misconfiguration isn't integrated properly with the altp2m system yet. Doing it right may take some thought. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/16/2018 08:47 PM, George Dunlap wrote: > On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: >> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >>> Debugging continues. >> >> Finally, the attached patch seems to get the display unstuck in my >> scenario, although for one guest I get: >> >> (XEN) d2v0 Unexpected vmexit: reason 49 >> (XEN) domain_crash called from vmx.c:4120 >> (XEN) Domain 2 (vcpu#0) crashed on cpu#1: >> (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] >> (XEN) CPU:1 >> (XEN) RIP:0010:[] >> (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) >> (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 >> (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 >> (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 >> (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 >> (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 >> (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 >> (XEN) cr3: ef771000 cr2: f900c00c8000 >> (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 >> (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 >> >> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping >> somebody more familiar with the code can point to a more elegant >> solution if one exists. > > I think I have an idea what's going on, but it's complicated. :-) > > Basically, the logdirty functionality isn't simple, and needs careful > thought on how to integrate it. I'll write some more tomorrow, and see > if I can come up with a solution. I think I know why this happens for the one guest - the other guests start at a certain resolution display-wise and stay that way until shutdown. This particular guest starts with a larger screen, then goes to roughly 2/3rds of it, then tries to go back to the initial larger one - at which point the above happens. I assume this corresponds to some pages being removed and/or added. I'll test this theory more tomorrow - if it's correct I should be able to reproduce the crash (with the patch) by simply resetting the screen resolution (increasing it). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: > On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >> Debugging continues. > > Finally, the attached patch seems to get the display unstuck in my > scenario, although for one guest I get: > > (XEN) d2v0 Unexpected vmexit: reason 49 > (XEN) domain_crash called from vmx.c:4120 > (XEN) Domain 2 (vcpu#0) crashed on cpu#1: > (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] > (XEN) CPU:1 > (XEN) RIP:0010:[] > (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) > (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 > (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 > (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 > (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 > (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 > (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 > (XEN) cr3: ef771000 cr2: f900c00c8000 > (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 > (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 > > i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping > somebody more familiar with the code can point to a more elegant > solution if one exists. I think I have an idea what's going on, but it's complicated. :-) Basically, the logdirty functionality isn't simple, and needs careful thought on how to integrate it. I'll write some more tomorrow, and see if I can come up with a solution. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/13/2018 07:38 PM, Tamas K Lengyel wrote: > On Fri, Apr 13, 2018 at 8:44 AM, Razvan Cojocaru > wrote: >> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >>> Debugging continues. >> >> Finally, the attached patch seems to get the display unstuck in my >> scenario, although for one guest I get: >> >> (XEN) d2v0 Unexpected vmexit: reason 49 >> (XEN) domain_crash called from vmx.c:4120 >> (XEN) Domain 2 (vcpu#0) crashed on cpu#1: >> (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] >> (XEN) CPU:1 >> (XEN) RIP:0010:[] >> (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) >> (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 >> (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 >> (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 >> (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 >> (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 >> (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 >> (XEN) cr3: ef771000 cr2: f900c00c8000 >> (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 >> (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 >> >> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping >> somebody more familiar with the code can point to a more elegant >> solution if one exists. > > I dare say at this point you might just be that person :) I looked at > the patch, one issue with p2m_change_type_range in its current form is > that it's unclear how it would behave when the range contains remapped > gfns in an altp2m (either a page that remaps to this range, or a page > that remaps from the range). According to the current altp2m approach > I would say the altp2m views should probably just get completely reset > if there is such a remapping. Yes, it's all quite complex. Actually, for one I think I should set p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; at switch-time rather than at p2m_init_altp2m_ept() time, for obvious reasons. I wonder if the sane behaviour here isn't to always copy everything from the active view to the view we're switching to (whatever "copy everything" means, I'm fuzzy on this myself). Also, looking at the code I think that the default_access parameter to the libxc function we were discussing earlier should simply reach p2m_init_altp2m_ept() and be assigned to p2m->default_access - that probably clears the mystery. You're saying that p2m_change_type_range() should act more like p2m_altp2m_propagate_change() for p2ms that are not the hostp2m, right? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On Fri, Apr 13, 2018 at 8:44 AM, Razvan Cojocaru wrote: > On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >> Debugging continues. > > Finally, the attached patch seems to get the display unstuck in my > scenario, although for one guest I get: > > (XEN) d2v0 Unexpected vmexit: reason 49 > (XEN) domain_crash called from vmx.c:4120 > (XEN) Domain 2 (vcpu#0) crashed on cpu#1: > (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] > (XEN) CPU:1 > (XEN) RIP:0010:[] > (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) > (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 > (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 > (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 > (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 > (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 > (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 > (XEN) cr3: ef771000 cr2: f900c00c8000 > (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 > (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 > > i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping > somebody more familiar with the code can point to a more elegant > solution if one exists. I dare say at this point you might just be that person :) I looked at the patch, one issue with p2m_change_type_range in its current form is that it's unclear how it would behave when the range contains remapped gfns in an altp2m (either a page that remaps to this range, or a page that remaps from the range). According to the current altp2m approach I would say the altp2m views should probably just get completely reset if there is such a remapping. Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: > Debugging continues. Finally, the attached patch seems to get the display unstuck in my scenario, although for one guest I get: (XEN) d2v0 Unexpected vmexit: reason 49 (XEN) domain_crash called from vmx.c:4120 (XEN) Domain 2 (vcpu#0) crashed on cpu#1: (XEN) [ Xen-4.11-unstable x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:0010:[] (XEN) RFLAGS: 00010246 CONTEXT: hvm guest (d2v0) (XEN) rax: f8800300 rbx: f900c0083db0 rcx: aa55aa55 (XEN) rdx: fa80041bdc41 rsi: f900c00c69a0 rdi: 0001 (XEN) rbp: rsp: f88002ee9ef0 r8: fa80041bdc40 (XEN) r9: f80001810e80 r10: fa800342aa70 r11: f88002ee9e80 (XEN) r12: 0005 r13: 0001 r14: f900c00c08b0 (XEN) r15: 0001 cr0: 80050031 cr4: 000406f8 (XEN) cr3: ef771000 cr2: f900c00c8000 (XEN) fsb: fffde000 gsb: f80001810d00 gss: 07fdc000 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping somebody more familiar with the code can point to a more elegant solution if one exists. Thanks, Razvan diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 14b5939..3be02ca 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -1375,8 +1375,15 @@ void setup_ept_dump(void) void p2m_init_altp2m_ept(struct domain *d, unsigned int i) { struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); struct ept_data *ept; +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; +p2m->default_access = hostp2m->default_access; +p2m->domain = hostp2m->domain; +p2m->logdirty_ranges = hostp2m->logdirty_ranges; +p2m->global_logdirty = hostp2m->global_logdirty; + p2m->min_remapped_gfn = gfn_x(INVALID_GFN); p2m->max_remapped_gfn = 0; ept = &p2m->ept; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index c53cab4..00f85e1 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -248,7 +249,6 @@ int p2m_init(struct domain *d) int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start, unsigned long end) { -ASSERT(p2m_is_hostp2m(p2m)); if ( p2m->global_logdirty || rangeset_contains_range(p2m->logdirty_ranges, start, end) ) return 1; @@ -964,12 +964,12 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l, } /* Modify the p2m type of a range of gfns from ot to nt. */ -void p2m_change_type_range(struct domain *d, - unsigned long start, unsigned long end, - p2m_type_t ot, p2m_type_t nt) +static void _p2m_change_type_range(struct p2m_domain *p2m, + unsigned long start, unsigned long end, + p2m_type_t ot, p2m_type_t nt) { +struct domain *d = p2m->domain; unsigned long gfn = start; -struct p2m_domain *p2m = p2m_get_hostp2m(d); int rc = 0; ASSERT(ot != nt); @@ -1022,6 +1022,23 @@ void p2m_change_type_range(struct domain *d, p2m_unlock(p2m); } +void p2m_change_type_range(struct domain *d, + unsigned long start, unsigned long end, + p2m_type_t ot, p2m_type_t nt) +{ +unsigned int i; + +if ( !altp2m_active(d) ) +{ +_p2m_change_type_range(p2m_get_hostp2m(d), start, end, ot, nt); +return; +} + +for ( i = 0; i < MAX_ALTP2M; i++ ) +if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) ) +_p2m_change_type_range(d->arch.altp2m_p2m[i], start, end, ot, nt); +} + /* * Finish p2m type change for gfns which are marked as need_recalc in a range. * Returns: 0/1 for success, negative for failure ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >> After much debugging, it turns out that the >> "p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch >> to the new altp2m view fast enough, and that in turn disables the >> logdirty processing gated on it > > Actually as it turns out the exit doesn't happen at all anymore so > hvm_hap_nested_page_fault() doesn't get called (I've added a printk() in > hvm_hap_nested_page_fault() just before "/* Check access permissions > first, then handle faults */" and it doesn't appear). This is what seems to be happening, the following call traces end up calling p2m_altp2m_propagate_change() after switching to the new altp2m view early: (XEN) Xen call trace: (XEN)[] p2m_altp2m_propagate_change+0x4e/0x508 (XEN)[] p2m-ept.c#ept_set_entry+0x7e4/0x8c4 (XEN)[] p2m_set_entry+0xe2/0x124 (XEN)[] p2m.c#p2m_remove_page+0x1f3/0x209 (XEN)[] guest_physmap_remove_page+0x18c/0x214 (XEN)[] guest_remove_page+0x27b/0x2d3 (XEN)[] do_memory_op+0x502/0x22f8 (XEN)[] pv_hypercall+0x1f4/0x440 (XEN)[] lstar_enter+0x115/0x120 and (XEN) Xen call trace: (XEN)[] p2m_altp2m_propagate_change+0x4e/0x508 (XEN)[] p2m-ept.c#ept_set_entry+0x7e4/0x8c4 (XEN)[] p2m_set_entry+0xe2/0x124 (XEN)[] guest_physmap_add_entry+0x7c3/0xacd (XEN)[] do_memory_op+0x8f8/0x22f8 (XEN)[] pv_hypercall+0x1f4/0x440 (XEN)[] lstar_enter+0x115/0x120 So clearly it's the external pages described earlier by George and Alexey landing. p2m_altp2m_propagate_change() then proceeds to do nothing (because gfn > p2m->max_remapped_gfn, _and_ m == INVALID_MFN). Next, in hvm_hap_nested_page_fault() p2m_altp2m_lazy_copy() returns 1, which leads to the function just exiting with no further logdirty checks (there's a goto out; that jumps them), and then that's it, I don't see any more page faults for those gfns. I've tried an assorted array of strategies: change p2m_altp2m_propagate_change() to always call set_entry() on the altp2m view, doing set_entry() for every gfn in the hostp2m into the new altp2m view in p2m_init_altp2m_ept() (a combination of these), trying to run the logdirty code in hvm_hap_nested_page_fault() before the goto out corresponding to p2m_altp2m_lazy_copy(). None of these things has worked. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/11/2018 11:17 PM, Tamas K Lengyel wrote: > On Wed, Apr 11, 2018 at 12:39 AM, Razvan Cojocaru > wrote: >> Also related to this part of the altp2m design, I've also noticed that >> creating a new EPT view with libxc involves this function: >> >> int xc_altp2m_create_view(xc_interface *handle, uint32_t domid, >> xenmem_access_t default_access, >> uint16_t *view_id); >> >> However, default_access is completely ignored. If this was meant to >> reset all pages to that access it's trivial to wire it in into the >> hypervisor (we could just iterate through all guest pages from 0 to >> max_gpfn and set the access) - but in light of this issue that's not a >> good thing to do. Or, rather, it was meant to specify the default_access >> for new pages where it is not explicitly specified? > > I don't think the function of default_access for an altp2m was > settled, so both of these would sound to me like a valid setup. I > personally would prefer that if the default_access is specified and > it's not rwx it would result in the entire altp2m view getting > populated right there instead of it being done lazily. The whole > lazy-population of the altp2m views has been nothing but a headache. > Add to that the fact that views get completely reset under certain > situations without any indication being given to the application using > it.. might be time we rethink these parts of altp2m to make it more > robust. Do you mean the p2m_altp2m_lazy_copy() logic in hvm_hap_nested_page_fault()? That took a while to wrap my head around. Ideally a new view would immediately upon creation copy the default view. Later on, it would be updated (kept in sync) by p2m_altp2m_propagate_change(). That's the theory anyway. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On Wed, Apr 11, 2018 at 12:39 AM, Razvan Cojocaru wrote: > On 04/09/2018 05:12 PM, George Dunlap wrote: >> The obvious place to look is the logdirtyvram functionality, which is >> used to make it easier for QEMU to figure out which bits of the display >> buffer have been modified. One of the big reasons the altp2m >> functionality is still considered "experimental" is that these sorts of >> interactions were never carefully thought out. > > George (and Alexey), thanks for the help! > > I've managed to find out why this is happening, though I don't yet have > a solution for it. After much debugging, it turns out that the > "p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch > to the new altp2m view fast enough, and that in turn disables the > logdirty processing gated on it: > > /* Spurious fault? PoD and log-dirty also take this path. */ > if ( p2m_is_ram(p2mt) ) > { > rc = 1; > /* > * Page log dirty is always done with order 0. If this mfn resides > * in a large page, we do not change other pages type within that > * large page. > */ > if ( npfec.write_access ) > { > paging_mark_pfn_dirty(currd, _pfn(gfn)); > /* > * If p2m is really an altp2m, unlock here to avoid lock > * ordering violation when the change below is propagated > * from host p2m. > */ > if ( ap2m_active ) > __put_gfn(p2m, gfn); > p2m_change_type_one(currd, gfn, p2m_ram_logdirty, p2m_ram_rw); > __put_gfn(ap2m_active ? hostp2m : p2m, gfn); > > goto out; > } > goto out_put_gfn; > } > > I think this is likely because switching to the new EPT view loses > information stored when previously working with the default one, in this > case while that work has not finished (if that's the case, creating a > new altp2m view should also somehow inherit the history of view 0 - when > doing set_access() on view 0 the changes propagate to all existent > altp2m views, but I don't think that happens when a new view is created: > that starts with a clean slate). > > Also related to this part of the altp2m design, I've also noticed that > creating a new EPT view with libxc involves this function: > > int xc_altp2m_create_view(xc_interface *handle, uint32_t domid, > xenmem_access_t default_access, > uint16_t *view_id); > > However, default_access is completely ignored. If this was meant to > reset all pages to that access it's trivial to wire it in into the > hypervisor (we could just iterate through all guest pages from 0 to > max_gpfn and set the access) - but in light of this issue that's not a > good thing to do. Or, rather, it was meant to specify the default_access > for new pages where it is not explicitly specified? I don't think the function of default_access for an altp2m was settled, so both of these would sound to me like a valid setup. I personally would prefer that if the default_access is specified and it's not rwx it would result in the entire altp2m view getting populated right there instead of it being done lazily. The whole lazy-population of the altp2m views has been nothing but a headache. Add to that the fact that views get completely reset under certain situations without any indication being given to the application using it.. might be time we rethink these parts of altp2m to make it more robust. Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
> After much debugging, it turns out that the > "p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch > to the new altp2m view fast enough, and that in turn disables the > logdirty processing gated on it Actually as it turns out the exit doesn't happen at all anymore so hvm_hap_nested_page_fault() doesn't get called (I've added a printk() in hvm_hap_nested_page_fault() just before "/* Check access permissions first, then handle faults */" and it doesn't appear). Debugging continues. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/09/2018 05:12 PM, George Dunlap wrote: > The obvious place to look is the logdirtyvram functionality, which is > used to make it easier for QEMU to figure out which bits of the display > buffer have been modified. One of the big reasons the altp2m > functionality is still considered "experimental" is that these sorts of > interactions were never carefully thought out. George (and Alexey), thanks for the help! I've managed to find out why this is happening, though I don't yet have a solution for it. After much debugging, it turns out that the "p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch to the new altp2m view fast enough, and that in turn disables the logdirty processing gated on it: /* Spurious fault? PoD and log-dirty also take this path. */ if ( p2m_is_ram(p2mt) ) { rc = 1; /* * Page log dirty is always done with order 0. If this mfn resides * in a large page, we do not change other pages type within that * large page. */ if ( npfec.write_access ) { paging_mark_pfn_dirty(currd, _pfn(gfn)); /* * If p2m is really an altp2m, unlock here to avoid lock * ordering violation when the change below is propagated * from host p2m. */ if ( ap2m_active ) __put_gfn(p2m, gfn); p2m_change_type_one(currd, gfn, p2m_ram_logdirty, p2m_ram_rw); __put_gfn(ap2m_active ? hostp2m : p2m, gfn); goto out; } goto out_put_gfn; } I think this is likely because switching to the new EPT view loses information stored when previously working with the default one, in this case while that work has not finished (if that's the case, creating a new altp2m view should also somehow inherit the history of view 0 - when doing set_access() on view 0 the changes propagate to all existent altp2m views, but I don't think that happens when a new view is created: that starts with a clean slate). Also related to this part of the altp2m design, I've also noticed that creating a new EPT view with libxc involves this function: int xc_altp2m_create_view(xc_interface *handle, uint32_t domid, xenmem_access_t default_access, uint16_t *view_id); However, default_access is completely ignored. If this was meant to reset all pages to that access it's trivial to wire it in into the hypervisor (we could just iterate through all guest pages from 0 to max_gpfn and set the access) - but in light of this issue that's not a good thing to do. Or, rather, it was meant to specify the default_access for new pages where it is not explicitly specified? Either way, the argument is not used: case HVMOP_altp2m_create_p2m: if ( !(rc = p2m_init_next_altp2m(d, &a.u.view.view)) ) rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0; break; That's the progress so far. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On Sun, 8 Apr 2018 23:38:31 +0300 Razvan Cojocaru wrote: >I've noticed altp2m behaviour I can't explain yet - I'm not all that >familiar with all the ways the new views corellate with the previous >EPT-based "view 0". > >In short, if we create a new view and simply switch to it early in the >boot process of the guest, something goes wrong and the guest either >freezes, becomes unresponsive to input, or has something wrong with the >display (most often the latter, with a black band on top of the image): > >https://ibb.co/eUPJ6c >https://ibb.co/etCXXH > >That guest is a 64-bit Windows 7 system with nothing special about it. >It's easy to reproduce this: > >1. Start the guest paused (I've used "xl create -p myguest.conf"). >2. Patch xen-access like this: https://pastebin.com/67PpQ9fu (just >remove the part of the code that modifies the new view before switching >to it). >3. Hook xen-access to the guest ("./xen-access altp2m_write"). >4. Unpause the guest ("xl unpause "). > >I think that's a valid scenario and supposed to work. > >I've also noticed that if I wait to do this until the OS is >up-and-running (e.g. after logging into Windows), there seems to be no >problem. I don't know if this is just coincidence (as is bound to >happen with race-condition situations), or means something, but I can >get the problems every time when switching views early, and never when >switching the views late. > >Suggestions on what the problem could be are, as always, greatly >appreciated. The difference between starting the domain paused and switching altp2m views later (in this particular case) is likely the fact whether the emulated LFB backing storage (aka VRAM) was "deployed" to the guest domain or not. QEMU uses special ranges to maintain the state of some emulated devices with memories. These are VRAM and Option ROMs, but latter are emulated as MMIO currently for some reason (either a bug or intentionally). VRAM area is used to hold the LFB content for the emulated videocard (stdvga/cirrus/etc) and its usage depends on whether the emulated LFB is currently mapped or not. When you start the domain paused, VRAM is not mapped to the guest address space yet -- it is allocated to the location outside the guest memory map, known to QEMU. In order to map VRAM to the guest's address space, the emulated QEMU videocard needs to be initialized, which happens when hvmloader or guest OS allocates MMIO BARs for the emulated videocard. Upon writing the corresponding BAR register, the VRAM area is relocated to the guest's visible address space (via add_to_physmap hypercall). Disabling MMIO decoding may cause to relocate the VRAM range back to its hidden backing storage (again via add_to_physmap). So, it's basically the same appearance of the problem of different Xen/QEMU views on guest's memory layout. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/08/2018 09:38 PM, Razvan Cojocaru wrote: > Hello, > > I've noticed altp2m behaviour I can't explain yet - I'm not all that > familiar with all the ways the new views corellate with the previous > EPT-based "view 0". > > In short, if we create a new view and simply switch to it early in the > boot process of the guest, something goes wrong and the guest either > freezes, becomes unresponsive to input, or has something wrong with the > display (most often the latter, with a black band on top of the image): > > https://ibb.co/eUPJ6c > https://ibb.co/etCXXH > > That guest is a 64-bit Windows 7 system with nothing special about it. > It's easy to reproduce this: > > 1. Start the guest paused (I've used "xl create -p myguest.conf"). > 2. Patch xen-access like this: https://pastebin.com/67PpQ9fu (just > remove the part of the code that modifies the new view before switching > to it). > 3. Hook xen-access to the guest ("./xen-access altp2m_write"). > 4. Unpause the guest ("xl unpause "). > > I think that's a valid scenario and supposed to work. > > I've also noticed that if I wait to do this until the OS is > up-and-running (e.g. after logging into Windows), there seems to be no > problem. I don't know if this is just coincidence (as is bound to happen > with race-condition situations), or means something, but I can get the > problems every time when switching views early, and never when switching > the views late. > > Suggestions on what the problem could be are, as always, greatly > appreciated. The obvious place to look is the logdirtyvram functionality, which is used to make it easier for QEMU to figure out which bits of the display buffer have been modified. One of the big reasons the altp2m functionality is still considered "experimental" is that these sorts of interactions were never carefully thought out. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Weird altp2m behaviour when switching early to a new view
Hello, I've noticed altp2m behaviour I can't explain yet - I'm not all that familiar with all the ways the new views corellate with the previous EPT-based "view 0". In short, if we create a new view and simply switch to it early in the boot process of the guest, something goes wrong and the guest either freezes, becomes unresponsive to input, or has something wrong with the display (most often the latter, with a black band on top of the image): https://ibb.co/eUPJ6c https://ibb.co/etCXXH That guest is a 64-bit Windows 7 system with nothing special about it. It's easy to reproduce this: 1. Start the guest paused (I've used "xl create -p myguest.conf"). 2. Patch xen-access like this: https://pastebin.com/67PpQ9fu (just remove the part of the code that modifies the new view before switching to it). 3. Hook xen-access to the guest ("./xen-access altp2m_write"). 4. Unpause the guest ("xl unpause "). I think that's a valid scenario and supposed to work. I've also noticed that if I wait to do this until the OS is up-and-running (e.g. after logging into Windows), there seems to be no problem. I don't know if this is just coincidence (as is bound to happen with race-condition situations), or means something, but I can get the problems every time when switching views early, and never when switching the views late. Suggestions on what the problem could be are, as always, greatly appreciated. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel