On 08/06/17 20:09, Stefano Stabellini wrote: > On Thu, 8 Jun 2017, Juergen Gross wrote: >> On 07/06/17 20:19, Stefano Stabellini wrote: >>> On Wed, 7 Jun 2017, Juergen Gross wrote: >>>> On 06/06/17 21:08, Stefano Stabellini wrote: >>>>> On Tue, 6 Jun 2017, Juergen Gross wrote: >>>>>> On 06/06/17 18:39, Stefano Stabellini wrote: >>>>>>> On Tue, 6 Jun 2017, Juergen Gross wrote: >>>>>>>> On 26/05/17 21:01, Stefano Stabellini wrote: >>>>>>>>> On Fri, 26 May 2017, Juergen Gross wrote: >>>>>>>>>> On 26/05/17 18:19, Ian Jackson wrote: >>>>>>>>>>> Juergen Gross writes ("HVM guest performance regression"): >>>>>>>>>>>> Looking for the reason of a performance regression of HVM guests >>>>>>>>>>>> under >>>>>>>>>>>> Xen 4.7 against 4.5 I found the reason to be commit >>>>>>>>>>>> c26f92b8fce3c9df17f7ef035b54d97cbe931c7a ("libxl: remove >>>>>>>>>>>> freemem_slack") >>>>>>>>>>>> in Xen 4.6. >>>>>>>>>>>> >>>>>>>>>>>> The problem occurred when dom0 had to be ballooned down when >>>>>>>>>>>> starting >>>>>>>>>>>> the guest. The performance of some micro benchmarks dropped by >>>>>>>>>>>> about >>>>>>>>>>>> a factor of 2 with above commit. >>>>>>>>>>>> >>>>>>>>>>>> Interesting point is that the performance of the guest will depend >>>>>>>>>>>> on >>>>>>>>>>>> the amount of free memory being available at guest creation time. >>>>>>>>>>>> When there was barely enough memory available for starting the >>>>>>>>>>>> guest >>>>>>>>>>>> the performance will remain low even if memory is being freed >>>>>>>>>>>> later. >>>>>>>>>>>> >>>>>>>>>>>> I'd like to suggest we either revert the commit or have some other >>>>>>>>>>>> mechanism to try to have some reserve free memory when starting a >>>>>>>>>>>> domain. >>>>>>>>>>> >>>>>>>>>>> Oh, dear. The memory accounting swamp again. Clearly we are not >>>>>>>>>>> going to drain that swamp now, but I don't like regressions. >>>>>>>>>>> >>>>>>>>>>> I am not opposed to reverting that commit. I was a bit iffy about >>>>>>>>>>> it >>>>>>>>>>> at the time; and according to the removal commit message, it was >>>>>>>>>>> basically removed because it was a piece of cargo cult for which we >>>>>>>>>>> had no justification in any of our records. >>>>>>>>>>> >>>>>>>>>>> Indeed I think fixing this is a candidate for 4.9. >>>>>>>>>>> >>>>>>>>>>> Do you know the mechanism by which the freemem slack helps ? I >>>>>>>>>>> think >>>>>>>>>>> that would be a prerequisite for reverting this. That way we can >>>>>>>>>>> have >>>>>>>>>>> an understanding of why we are doing things, rather than just >>>>>>>>>>> flailing at random... >>>>>>>>>> >>>>>>>>>> I wish I would understand it. >>>>>>>>>> >>>>>>>>>> One candidate would be 2M/1G pages being possible with enough free >>>>>>>>>> memory, but I haven't proofed this yet. I can have a try by disabling >>>>>>>>>> big pages in the hypervisor. >>>>>>>>> >>>>>>>>> Right, if I had to bet, I would put my money on superpages shattering >>>>>>>>> being the cause of the problem. >>>>>>>> >>>>>>>> Seems you would have lost your money... >>>>>>>> >>>>>>>> Meanwhile I've found a way to get the "good" performance in the micro >>>>>>>> benchmark. Unfortunately this requires to switch off the pv interfaces >>>>>>>> in the HVM guest via "xen_nopv" kernel boot parameter. >>>>>>>> >>>>>>>> I have verified that pv spinlocks are not to blame (via "xen_nopvspin" >>>>>>>> kernel boot parameter). Switching to clocksource TSC in the running >>>>>>>> system doesn't help either. >>>>>>> >>>>>>> What about xen_hvm_exit_mmap (an optimization for shadow pagetables) and >>>>>>> xen_hvm_smp_init (PV IPI)? >>>>>> >>>>>> xen_hvm_exit_mmap isn't active (kernel message telling me so was >>>>>> issued). >>>>>> >>>>>>>> Unfortunately the kernel seems no longer to be functional when I try to >>>>>>>> tweak it not to use the PVHVM enhancements. >>>>>>> >>>>>>> I guess you are not talking about regular PV drivers like netfront and >>>>>>> blkfront, right? >>>>>> >>>>>> The plan was to be able to use PV drivers without having to use PV >>>>>> callbacks and PV timers. This isn't possible right now. >>>>> >>>>> I think the code to handle that scenario was gradually removed over time >>>>> to simplify the code base. >>>> >>>> Hmm, too bad. >>>> >>>>>>>> I'm wondering now whether >>>>>>>> there have ever been any benchmarks to proof PVHVM really being faster >>>>>>>> than non-PVHVM? My findings seem to suggest there might be a huge >>>>>>>> performance gap with PVHVM. OTOH this might depend on hardware and >>>>>>>> other >>>>>>>> factors. >>>>>>>> >>>>>>>> Stefano, didn't you do the PVHVM stuff back in 2010? Do you have any >>>>>>>> data from then regarding performance figures? >>>>>>> >>>>>>> Yes, I still have these slides: >>>>>>> >>>>>>> https://www.slideshare.net/xen_com_mgr/linux-pv-on-hvm >>>>>> >>>>>> Thanks. So you measured the overall package, not the single items like >>>>>> callbacks, timers, time source? I'm asking because I start to believe >>>>>> there are some of those slower than their non-PV variants. >>>>> >>>>> There isn't much left in terms of individual optimizations: you already >>>>> tried switching clocksource and removing pv spinlocks. xen_hvm_exit_mmap >>>>> is not used. Only the following are left (you might want to double check >>>>> I haven't missed anything): >>>>> >>>>> 1) PV IPI >>>> >>>> Its a 1 vcpu guest. >>>> >>>>> 2) PV suspend/resume >>>>> 3) vector callback >>>>> 4) interrupt remapping >>>>> >>>>> 2) is not on the hot path. >>>>> I did individual measurements of 3) at some points and it was a clear win. >>>> >>>> That might depend on the hardware. Could it be newer processors are >>>> faster here? >>> >>> I don't think so: the alternative it's an emulated interrupt. It's >>> slower under all points of view. >> >> What about APIC virtualization of modern processors? Are you sure e.g. >> timer interrupts aren't handled completely by the processor? I guess >> this might be faster than letting it be handled by the hypervisor and >> then use the callback into the guest. >> >>> I would try to run the test with xen_emul_unplug="never" which means >>> that you are going to end up using the emulated network card and >>> emulated IDE controller, but some of the other optimizations (like the >>> vector callback) will still be active. >> >> Now this is something I wouldn't like to do. My test isn't using any >> I/O at all and is showing bad performance with pv interfaces being used. >> The only remedy right now seems to be to switch off pv interfaces >> leading to a bad I/O performance, but a good non-I/O performance. >> >> You are suggesting a mode with bad I/O performance _and_ bad non-I/O >> performance. > > I was only suggesting this for debugging, to better understand the > problem, not as a solution. > > >>> If the cause of the problem is ballooning for example, using emulated >>> interfaces for IO will reduce the amount of ballooned out pages >>> significantly. >> >> No I/O involved in my benchmark. > > I admit that if your test doesn't do any I/O, it is not likely that > xen_emul_unplug="never" will help us understand the problem. > > Nonetheless, I believe that a simple blkfront/blkback or > netfront/netback connection, even without any I/O being done, leads to a > couple of calls into the ballooning code (xenbus_map_ring_valloc_hvm -> > alloc_xenballooned_pages).
Only if the backend lives in a hvm domain. So in my case no problem, as I have a classical pv dom0 hosting the backends. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel