Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 11.09.15 at 02:59,wrote: > If you want a formula I would do: > > #define MAX_SOCKETS 8 > > max_pfns = pow(2,(MAX_SOCKETS - (max(nr_iommus(), MAX_SOCKETS * 64; > > Where nr_iommus would have to be somehow implemented, ditto for pow. > > This should give you: > 8-> 64 > 7-> 128 > 6-> 256 > 5-> 512 > 4-> 1024 > 3-> 2048 > 2-> 4096 > 1-> 16384 16k seems excessive as a default. Also - why would this be related to the number of sockets? I don't think there's a one-IOMMU-per- socket rule; fixed-number-of-IOMMUs-per-node might come closer, but there we'd have the problem of what "fixed number" is. Wouldn't something as simple as 1024 / nr_iommus() do? I also don't follow what cache flushes you talked about earlier: I don't think the IOMMUs drive any global cache flushes, and I would have thought the size limited IOTLB and (CPU side) cache ones should be pretty limited in terms of bus load (unless the TLB ones would get converted to global ones due to lacking IOMMU capabilities). Is that not the case? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 11.09.15 at 14:05,wrote: > The flush_all(FLUSH_CACHE) in mtrr.c will result in a flush_area_mask for > all CPU's in the host. > It will more time to issue a IPI to all logical cores the more core's there > are. I admit that > x2apic_cluster mode may speed this up but not all hosts will have that > enabled. > > The data flush will force all data out to memory controllers and it's > possible that CPU's in > difference package have cached data all corresponding to a particular memory > controller which will > become a bottleneck. > > In worst case, with large delay between XEN_DOMCTL_memory_mapping hypercalls > and on a 8 socket > system you may end up writing out 45MB (L3 cache) * 8 = 360MB to a single > memory controller every 64 > pages (256KiB) of domU p2m updated. True. Considering that BARs need to be properly aligned in both guest and host address spaces, I wonder why we aren't using large pages to map such huge BARs then. As it looks this would require redefining the semantics of the domctl once again, but that's not a big problem since - it's a domctl. I'll see if I can cook up something (assuming that hosts used for passing through devices with such huge BARs will have support for at least 2Mb pages in both EPT [NPT always has] and IOMMU). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On 11/09/15 12:11, Jan Beulich wrote: On 11.09.15 at 12:28,wrote: >> On 11/09/15 10:17, Jan Beulich wrote: >> On 11.09.15 at 02:59, wrote: If you want a formula I would do: #define MAX_SOCKETS 8 max_pfns = pow(2,(MAX_SOCKETS - (max(nr_iommus(), MAX_SOCKETS * 64; Where nr_iommus would have to be somehow implemented, ditto for pow. This should give you: 8 -> 64 7 -> 128 6 -> 256 5 -> 512 4 -> 1024 3 -> 2048 2 -> 4096 1 -> 16384 >>> >>> 16k seems excessive as a default. Also - why would this be related >>> to the number of sockets? I don't think there's a one-IOMMU-per- >>> socket rule; fixed-number-of-IOMMUs-per-node might come closer, >>> but there we'd have the problem of what "fixed number" is. Wouldn't >>> something as simple as 1024 / nr_iommus() do? >>> >>> I also don't follow what cache flushes you talked about earlier: I >>> don't think the IOMMUs drive any global cache flushes, and I >>> would have thought the size limited IOTLB and (CPU side) cache >>> ones should be pretty limited in terms of bus load (unless the TLB >>> ones would get converted to global ones due to lacking IOMMU >>> capabilities). Is that not the case? >> >> The data cache flushes are caused by the memory_type_changed() >> call at the bottom of the XEN_DOMCTL_memory_mapping hypercall, >> not by the IOMMU code itself. > > In which case - contrary to what Konrad said he measured - their > impact on overall throughput shouldn't scale with socket (or node) > count (unless the hardware implementation of the flushes is bad). > The flush_all(FLUSH_CACHE) in mtrr.c will result in a flush_area_mask for all CPU's in the host. It will more time to issue a IPI to all logical cores the more core's there are. I admit that x2apic_cluster mode may speed this up but not all hosts will have that enabled. The data flush will force all data out to memory controllers and it's possible that CPU's in difference package have cached data all corresponding to a particular memory controller which will become a bottleneck. In worst case, with large delay between XEN_DOMCTL_memory_mapping hypercalls and on a 8 socket system you may end up writing out 45MB (L3 cache) * 8 = 360MB to a single memory controller every 64 pages (256KiB) of domU p2m updated. Malcolm > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On 11/09/15 10:17, Jan Beulich wrote: On 11.09.15 at 02:59,wrote: >> If you want a formula I would do: >> >> #define MAX_SOCKETS 8 >> >> max_pfns = pow(2,(MAX_SOCKETS - (max(nr_iommus(), MAX_SOCKETS * 64; >> >> Where nr_iommus would have to be somehow implemented, ditto for pow. >> >> This should give you: >> 8 -> 64 >> 7 -> 128 >> 6 -> 256 >> 5 -> 512 >> 4 -> 1024 >> 3 -> 2048 >> 2 -> 4096 >> 1 -> 16384 > > 16k seems excessive as a default. Also - why would this be related > to the number of sockets? I don't think there's a one-IOMMU-per- > socket rule; fixed-number-of-IOMMUs-per-node might come closer, > but there we'd have the problem of what "fixed number" is. Wouldn't > something as simple as 1024 / nr_iommus() do? > > I also don't follow what cache flushes you talked about earlier: I > don't think the IOMMUs drive any global cache flushes, and I > would have thought the size limited IOTLB and (CPU side) cache > ones should be pretty limited in terms of bus load (unless the TLB > ones would get converted to global ones due to lacking IOMMU > capabilities). Is that not the case? The data cache flushes are caused by the memory_type_changed() call at the bottom of the XEN_DOMCTL_memory_mapping hypercall, not by the IOMMU code itself. Malcolm > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 11.09.15 at 12:28,wrote: > On 11/09/15 10:17, Jan Beulich wrote: > On 11.09.15 at 02:59, wrote: >>> If you want a formula I would do: >>> >>> #define MAX_SOCKETS 8 >>> >>> max_pfns = pow(2,(MAX_SOCKETS - (max(nr_iommus(), MAX_SOCKETS * 64; >>> >>> Where nr_iommus would have to be somehow implemented, ditto for pow. >>> >>> This should give you: >>> 8 -> 64 >>> 7 -> 128 >>> 6 -> 256 >>> 5 -> 512 >>> 4 -> 1024 >>> 3 -> 2048 >>> 2 -> 4096 >>> 1 -> 16384 >> >> 16k seems excessive as a default. Also - why would this be related >> to the number of sockets? I don't think there's a one-IOMMU-per- >> socket rule; fixed-number-of-IOMMUs-per-node might come closer, >> but there we'd have the problem of what "fixed number" is. Wouldn't >> something as simple as 1024 / nr_iommus() do? >> >> I also don't follow what cache flushes you talked about earlier: I >> don't think the IOMMUs drive any global cache flushes, and I >> would have thought the size limited IOTLB and (CPU side) cache >> ones should be pretty limited in terms of bus load (unless the TLB >> ones would get converted to global ones due to lacking IOMMU >> capabilities). Is that not the case? > > The data cache flushes are caused by the memory_type_changed() > call at the bottom of the XEN_DOMCTL_memory_mapping hypercall, > not by the IOMMU code itself. In which case - contrary to what Konrad said he measured - their impact on overall throughput shouldn't scale with socket (or node) count (unless the hardware implementation of the flushes is bad). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
Sort of (the patch has the intended effect, but for its size very many rough edges). I guess we need to amend the original parameter, once_mapping_mfns, like this, /* xen_once_mapping_mfns: memory mapping mfn bumbers once. */ unsigned int xen_once_mapping_mfns; size_param("once_mapping_mfns", xen_once_mapping_mfns); static void xen_once_mapping_mfns_setup(void) { if ( once_mapping_mfns < 64 ) xen_once_mapping_mfns = 64; else if ( once_mapping_mfns > 1024 ) xen_once_mapping_mfns = 1024; else xen_once_mapping_mfns = once_mapping_mfns; } Or what is your expected rule? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 10.09.15 at 07:28,wrote: >> > If the 64 limit was arbitrary then I would suggest increasing it to at >> > least >>> 1024 so that >>> at least 4M of BAR can be mapped in one go and it reduces the overhead by a >>> factor of 16. >> >> 1024 may be a little much, but 256 is certainly a possibility, plus >> Konrad's suggestion to allow this limit to be controlled via command >> line option. > > Are you guys talking this way? Sort of (the patch has the intended effect, but for its size very many rough edges). Jan > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > index 3946e4c..a9671bb 100644 > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -88,6 +88,10 @@ boolean_param("noapic", skip_ioapic_setup); > s8 __read_mostly xen_cpuidle = -1; > boolean_param("cpuidle", xen_cpuidle); > > +/* once_mapping_mfns: memory mapping mfn bumbers once. */ > +unsigned int xen_once_mapping_mfns; > +integer_param("once_mapping_mfns", xen_once_mapping_mfns); > + > #ifndef NDEBUG > unsigned long __initdata highmem_start; > size_param("highmem-start", highmem_start); > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > index 3bf39f1..82c85e3 100644 > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -33,6 +33,8 @@ > #include > #include > > +extern unsigned int xen_once_mapping_mfns; > + > static DEFINE_SPINLOCK(domctl_lock); > DEFINE_SPINLOCK(vcpu_alloc_lock); > > @@ -1035,7 +1037,7 @@ long > do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) > > ret = -E2BIG; > /* Must break hypercall up as this could take a while. */ > -if ( nr_mfns > 64 ) > +if ( nr_mfns > xen_once_mapping_mfns ) > break; > > ret = -EPERM; > > Thanks > Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 10.09.15 at 10:55,wrote: >> Sort of (the patch has the intended effect, but for its size very >> many rough edges). >> > > I guess we need to amend the original parameter, once_mapping_mfns, like > this, > > /* xen_once_mapping_mfns: memory mapping mfn bumbers once. */ > unsigned int xen_once_mapping_mfns; > size_param("once_mapping_mfns", xen_once_mapping_mfns); > > static void xen_once_mapping_mfns_setup(void) > { > if ( once_mapping_mfns < 64 ) > xen_once_mapping_mfns = 64; > else if ( once_mapping_mfns > 1024 ) > xen_once_mapping_mfns = 1024; > else > xen_once_mapping_mfns = once_mapping_mfns; > } Right, that's one of the things that would need taking care of. (Whether enforcing an upper limit is actually needed I'm not sure - we generally allow the admin to shoot himself in the foot if he wants to. And whether the lower limit should be 64 instead of just ensuring the limit is not zero is another question.) Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On Thu, Sep 10, 2015 at 02:59:17AM -0600, Jan Beulich wrote: > >>> On 10.09.15 at 10:55,wrote: > >> Sort of (the patch has the intended effect, but for its size very > >> many rough edges). > >> > > > > I guess we need to amend the original parameter, once_mapping_mfns, like > > this, > > > > /* xen_once_mapping_mfns: memory mapping mfn bumbers once. */ > > unsigned int xen_once_mapping_mfns; > > size_param("once_mapping_mfns", xen_once_mapping_mfns); > > > > static void xen_once_mapping_mfns_setup(void) > > { > > if ( once_mapping_mfns < 64 ) > > xen_once_mapping_mfns = 64; > > else if ( once_mapping_mfns > 1024 ) > > xen_once_mapping_mfns = 1024; > > else > > xen_once_mapping_mfns = once_mapping_mfns; > > } > > Right, that's one of the things that would need taking care of. > (Whether enforcing an upper limit is actually needed I'm not > sure - we generally allow the admin to shoot himself in the foot > if he wants to. And whether the lower limit should be 64 instead > of just ensuring the limit is not zero is another question.) 64 was semi-arbitrary - it ended up giving good latency on highly scalar machines (8 socket). Higher numbers ended up affecting the latency. But higher numbers on small socket machines were OK. (As they do not have 8 IOMMU VT-d chipsets all potentially flodding the QPI with serialized cache flushes). > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On Fri, Sep 11, 2015 at 08:44:50AM +0800, Chen, Tiejun wrote: > >>Right, that's one of the things that would need taking care of. > >>(Whether enforcing an upper limit is actually needed I'm not > >>sure - we generally allow the admin to shoot himself in the foot > >>if he wants to. And whether the lower limit should be 64 instead > >>of just ensuring the limit is not zero is another question.) > > > >64 was semi-arbitrary - it ended up giving good latency on > >highly scalar machines (8 socket). Higher numbers ended up > >affecting the latency. > > > >But higher numbers on small socket machines were OK. > >(As they do not have 8 IOMMU VT-d chipsets all potentially > >flodding the QPI with serialized cache flushes). > >> > > So we should make this range [8, ] here, but 64 by > default. Right? Not sure I follow you. The 8 was based on 8 socket machines having 8 IOMMUs.. If you want a formula I would do: #define MAX_SOCKETS 8 max_pfns = pow(2,(MAX_SOCKETS - (max(nr_iommus(), MAX_SOCKETS * 64; Where nr_iommus would have to be somehow implemented, ditto for pow. This should give you: 8 -> 64 7 -> 128 6 -> 256 5 -> 512 4 -> 1024 3 -> 2048 2 -> 4096 1 -> 16384 > > Thanks > Tiejun > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
Right, that's one of the things that would need taking care of. (Whether enforcing an upper limit is actually needed I'm not sure - we generally allow the admin to shoot himself in the foot if he wants to. And whether the lower limit should be 64 instead of just ensuring the limit is not zero is another question.) 64 was semi-arbitrary - it ended up giving good latency on highly scalar machines (8 socket). Higher numbers ended up affecting the latency. But higher numbers on small socket machines were OK. (As they do not have 8 IOMMU VT-d chipsets all potentially flodding the QPI with serialized cache flushes). So we should make this range [8, ] here, but 64 by default. Right? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
We should lower loglevel to XENLOG_G_DEBUG while mapping or unmapping memory via XEN_DOMCTL_memory_mapping since its fair enough to check this info just while debugging. CC: Ian CampbellCC: Ian Jackson CC: Jan Beulich CC: Keir Fraser CC: Tim Deegan Signed-off-by: Tiejun Chen --- xen/common/domctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 7f959f3..3bf39f1 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -1049,7 +1049,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( add ) { -printk(XENLOG_G_INFO +printk(XENLOG_G_DEBUG "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n", d->domain_id, gfn, mfn, nr_mfns); @@ -1061,7 +1061,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) } else { -printk(XENLOG_G_INFO +printk(XENLOG_G_DEBUG "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n", d->domain_id, gfn, mfn, nr_mfns); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 09.09.15 at 16:20,wrote: > Perhaps the solution is remove the first printk(s) and just have them > once the operation has completed? That may fix the outstanding tasklet > problem? Considering that this is a tool stack based retry, how would the hypervisor know when the _whole_ operation is done? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On Wed, Sep 09, 2015 at 02:50:25PM +0800, Tiejun Chen wrote: > We should lower loglevel to XENLOG_G_DEBUG while mapping or > unmapping memory via XEN_DOMCTL_memory_mapping since its > fair enough to check this info just while debugging. The issue you folks are hitting where it takes eons to boot with a large BAR _may_ be related to tasklets and triggering the hypercall_preempt_check(). I think you may be using 'console_to_ring' or such and collecting the hypervisor logs. That will schedule an tasklet whenever printk is used - and the tasklet will cause hypercall_preempt_check() to return true. So what we get is that on the entrace (even before calling map_mmio_region) for this function we have an outstanding softirq we must process - which means that this function exits as soon as possible to service it. Perhaps the solution is remove the first printk(s) and just have them once the operation has completed? That may fix the outstanding tasklet problem? Could you try that (or perhaps you have already tried it?) please? > > CC: Ian Campbell> CC: Ian Jackson > CC: Jan Beulich > CC: Keir Fraser > CC: Tim Deegan > Signed-off-by: Tiejun Chen > --- > xen/common/domctl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > index 7f959f3..3bf39f1 100644 > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -1049,7 +1049,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) > u_domctl) > > if ( add ) > { > -printk(XENLOG_G_INFO > +printk(XENLOG_G_DEBUG > "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n", > d->domain_id, gfn, mfn, nr_mfns); > > @@ -1061,7 +1061,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) > u_domctl) > } > else > { > -printk(XENLOG_G_INFO > +printk(XENLOG_G_DEBUG > "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n", > d->domain_id, gfn, mfn, nr_mfns); > > -- > 1.9.1 > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 09.09.15 at 16:50,wrote: > On Wed, Sep 09, 2015 at 08:33:52AM -0600, Jan Beulich wrote: >> >>> On 09.09.15 at 16:20, wrote: >> > Perhaps the solution is remove the first printk(s) and just have them >> > once the operation has completed? That may fix the outstanding tasklet >> > problem? >> >> Considering that this is a tool stack based retry, how would the >> hypervisor know when the _whole_ operation is done? > > I was merely thinking of moving the printk _after_ the map_mmio_regions > so there wouldn't be any outstanding preemption points in map_mmio_regions > (so it can at least do the 64 PFNs). But there are no preemption points. That's why you needed to use tool stack based retries in the first place. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
>>> On 09.09.15 at 17:19,wrote: > On 09/09/15 15:50, Konrad Rzeszutek Wilk wrote: >> On Wed, Sep 09, 2015 at 08:33:52AM -0600, Jan Beulich wrote: >> On 09.09.15 at 16:20, wrote: Perhaps the solution is remove the first printk(s) and just have them once the operation has completed? That may fix the outstanding tasklet problem? >>> >>> Considering that this is a tool stack based retry, how would the >>> hypervisor know when the _whole_ operation is done? >> >> I was merely thinking of moving the printk _after_ the map_mmio_regions >> so there wouldn't be any outstanding preemption points in map_mmio_regions >> (so it can at least do the 64 PFNs). >> >> But going forward a couple of ideas: >> >> - The 64 limit was arbitrary. It could have been 42 or PFNs / >> num_online_cpus(), >>or actually finding out the size of the BAR and figuring the optimal >>case so that it will be done under 1ms. Or perhaps just provide an >>boot time parameter for those that really are struggling with this. > > The issue of it taking a long time to map a large BAR is caused by the > unconditional > memory_type_changed call at the bottom of XEN_DOMCTL_memory_mapping > function. > > The memory_type_changed call results in a full data cache flush on VM with a > PCI device > assigned. > > We have seen this overhead cause a 1GB BAR to take 20 seconds to map it's > MMIO. > > If the 64 limit was arbitrary then I would suggest increasing it to at least > 1024 so that > at least 4M of BAR can be mapped in one go and it reduces the overhead by a > factor of 16. 1024 may be a little much, but 256 is certainly a possibility, plus Konrad's suggestion to allow this limit to be controlled via command line option. > Currently the toolstack attempts lower and lower powers of 2 size's of the > MMIO region to be mapped > until the hypercall succeeds. > > Is it not clear to me why we have an unconditional call to > memory_type_changed in the domctl. > Can somebody explain why it can't be made condition on errors? You mean only on success? That may be possible (albeit as you can see from the commit introducing it I wasn't sure, and I'm still not sure), but wouldn't buy you anything. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On Wed, Sep 09, 2015 at 08:55:38AM -0600, Jan Beulich wrote: > >>> On 09.09.15 at 16:50,wrote: > > On Wed, Sep 09, 2015 at 08:33:52AM -0600, Jan Beulich wrote: > >> >>> On 09.09.15 at 16:20, wrote: > >> > Perhaps the solution is remove the first printk(s) and just have them > >> > once the operation has completed? That may fix the outstanding tasklet > >> > problem? > >> > >> Considering that this is a tool stack based retry, how would the > >> hypervisor know when the _whole_ operation is done? > > > > I was merely thinking of moving the printk _after_ the map_mmio_regions > > so there wouldn't be any outstanding preemption points in map_mmio_regions > > (so it can at least do the 64 PFNs). > > But there are no preemption points. That's why you needed to use > tool stack based retries in the first place. You are totally right. I recall seeing it but I realize it was an RFC patch I wrote that would have made map_mmio_regions call the hypercall_preempt_check()! Sorry about the noise! > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On Wed, Sep 09, 2015 at 08:33:52AM -0600, Jan Beulich wrote: > >>> On 09.09.15 at 16:20,wrote: > > Perhaps the solution is remove the first printk(s) and just have them > > once the operation has completed? That may fix the outstanding tasklet > > problem? > > Considering that this is a tool stack based retry, how would the > hypervisor know when the _whole_ operation is done? I was merely thinking of moving the printk _after_ the map_mmio_regions so there wouldn't be any outstanding preemption points in map_mmio_regions (so it can at least do the 64 PFNs). But going forward a couple of ideas: - The 64 limit was arbitrary. It could have been 42 or PFNs / num_online_cpus(), or actually finding out the size of the BAR and figuring the optimal case so that it will be done under 1ms. Or perhaps just provide an boot time parameter for those that really are struggling with this. - Perhaps add a new API to the P2M code so it can do the operations in batches. Our map_mmio_region iterates over every PFN which is surely not efficient. Some batching could help? Maybe? What other code could benefit from this? Would the boot-time creation of IOMMU page-tables also help with this (which takes 10 minutes on a 1TB box BTW) - This printk can be altogether removed in the hypervisor and moved in the toolstack. That is the libxl xc_domain_add_to_physmap could itself use the logging facility (xc_report?) the action. > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
On 09/09/15 15:50, Konrad Rzeszutek Wilk wrote: > On Wed, Sep 09, 2015 at 08:33:52AM -0600, Jan Beulich wrote: > On 09.09.15 at 16:20,wrote: >>> Perhaps the solution is remove the first printk(s) and just have them >>> once the operation has completed? That may fix the outstanding tasklet >>> problem? >> >> Considering that this is a tool stack based retry, how would the >> hypervisor know when the _whole_ operation is done? > > I was merely thinking of moving the printk _after_ the map_mmio_regions > so there wouldn't be any outstanding preemption points in map_mmio_regions > (so it can at least do the 64 PFNs). > > But going forward a couple of ideas: > > - The 64 limit was arbitrary. It could have been 42 or PFNs / > num_online_cpus(), >or actually finding out the size of the BAR and figuring the optimal >case so that it will be done under 1ms. Or perhaps just provide an >boot time parameter for those that really are struggling with this. The issue of it taking a long time to map a large BAR is caused by the unconditional memory_type_changed call at the bottom of XEN_DOMCTL_memory_mapping function. The memory_type_changed call results in a full data cache flush on VM with a PCI device assigned. We have seen this overhead cause a 1GB BAR to take 20 seconds to map it's MMIO. If the 64 limit was arbitrary then I would suggest increasing it to at least 1024 so that at least 4M of BAR can be mapped in one go and it reduces the overhead by a factor of 16. Currently the toolstack attempts lower and lower powers of 2 size's of the MMIO region to be mapped until the hypercall succeeds. Is it not clear to me why we have an unconditional call to memory_type_changed in the domctl. Can somebody explain why it can't be made condition on errors? Malcolm > > - Perhaps add a new API to the P2M code so it can do the operations >in batches. Our map_mmio_region iterates over every PFN which is >surely not efficient. Some batching could help? Maybe? What other >code could benefit from this? Would the boot-time creation of IOMMU >page-tables also help with this (which takes 10 minutes on a 1TB box >BTW) > > - This printk can be altogether removed in the hypervisor and moved >in the toolstack. That is the libxl xc_domain_add_to_physmap >could itself use the logging facility (xc_report?) the action. > >> >> Jan >> > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
If the 64 limit was arbitrary then I would suggest increasing it to at least 1024 so that at least 4M of BAR can be mapped in one go and it reduces the overhead by a factor of 16. 1024 may be a little much, but 256 is certainly a possibility, plus Konrad's suggestion to allow this limit to be controlled via command line option. Are you guys talking this way? diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 3946e4c..a9671bb 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -88,6 +88,10 @@ boolean_param("noapic", skip_ioapic_setup); s8 __read_mostly xen_cpuidle = -1; boolean_param("cpuidle", xen_cpuidle); +/* once_mapping_mfns: memory mapping mfn bumbers once. */ +unsigned int xen_once_mapping_mfns; +integer_param("once_mapping_mfns", xen_once_mapping_mfns); + #ifndef NDEBUG unsigned long __initdata highmem_start; size_param("highmem-start", highmem_start); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 3bf39f1..82c85e3 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -33,6 +33,8 @@ #include #include +extern unsigned int xen_once_mapping_mfns; + static DEFINE_SPINLOCK(domctl_lock); DEFINE_SPINLOCK(vcpu_alloc_lock); @@ -1035,7 +1037,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) ret = -E2BIG; /* Must break hypercall up as this could take a while. */ -if ( nr_mfns > 64 ) +if ( nr_mfns > xen_once_mapping_mfns ) break; ret = -EPERM; Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel