On Wed, Oct 21, 2015 at 06:26:04PM +0200, Dario Faggioli wrote:
> Hi everyone,
>
> I managed running again the benchmarks I had already showed off here:
Hey!
Thank you for doing that.
>
> [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
> https://lkml.org/lkml/2015/8/18/302
>
> Basically, this is about Linux guests using topology information for
> scheduling, while they just don't make any sense when on Xen as (unless
> static and guest-lifetime long pinning is used) vCPUs do move around!
>
> Some more context is also available here:
>
> http://lists.xen.org/archives/html/xen-devel/2015-07/msg03241.html
>
> This email is still about numbers obtained by running things in Dom0,
> and without overloading the host pCPUs at the Xen level (i.e., I'm
> using nr. dom0 vCPUs == nr. host pCPUs).
>
> With respect to previous round:
> - I've added results for hackbench
> - I've run the benches with both my patch[0] and Juergen's patch[1].
>My patch is 'dariof', in the spreadsheet; Juergen's is 'jgross'.
>
> Here are the numbers:
>
>
> https://docs.google.com/spreadsheets/d/17djcVV3FkmHmv1FKFBe9CQFnNgVumnM2U64MNvjzAn8/edit?usp=sharing
>
> (If anyone has issues with googledocs, tell me, and I'll try
> cutting-&-pasting in email, as I did the other time.)
>
> A few comments:
> * both the patches bring performance improvements. The only
>regression seems to happen in hackbench, when running with -g1.
>That is certainly not the typical use case of the benchmark, but we
>certainly can try figuring out better what happens in that case;
> * the two patches were supposed to provide almost identical results,
>and they actually do that, in most cases (e.g., all the instances
>of Unixbench);
> * when there are differences, it is hard to see a trend, or, in
>general, to identify a possible reason by looking at differences
>between the patches themselves, at least as far as these data are
>concerned. In fact, in the "make xen" case, for instance, 'jgross'
>is better when building with -j20 and -j24, while 'dariof' is
>better when building with -j48 and -j62 (the host having 48 pCPUs).
>In the hackbench case, 'dariof' is better in the least concurrent
>case, 'jgross' is better in the other three.
>This all may well be due to some different and independent
>factor... Perhaps, a little bit more of investigation is necessary
>(and I'm up for it).
>
> IMO, future steps are:
> a) running benchmarks in a guest
> b) running benchmarks in more guests, and when overloading at the Xen
> level (i.e., having more vCPUs around than the host has pCPUs)
> c) tracing and/or collecting stats (e.g., from perf and xenalyze)
>
> I'm already working on a) and b).
>
> As far as which approach (mine or Juergen's) to adopt, I'm not sure,
> and it does not seem to make much difference, at least from the
> performance point of view. I don't have any particular issues with
> Juergen's patch, apart from the fact that I'm not yet sure how it makes
> the scheduling domains creation code behave. I can look into that and
> report.
>
> Also, this is all for PV guests. Any thoughts on what the best route
> would be for HVM ones?
Perhaps the same? What I presume we want is for each CPU to look
exactly like the same from the scheduling perspective. That is - there
should be no penalty in moving a task from one CPU to another. While
right now the Linux scheduler will not move certain tasks. This is
due to to how the topology looks on baremetal - and moving certain
tasks is prohibitive (say moving an task from one core to another core
costs more than moving from core to SMT).
>
> [0] http://pastebin.com/KF5WyPKz
> [1] http://pastebin.com/xSFLbLwn
>
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel