I made the process threw scripts just because cset doesn't work on debian :)
rcu_nocbs should moving much of kernel threads out of the selected cpus. A good example to know that your vcpus are not used is, for intel, to use turbostat. It show the activity of threads, and the average / top frequency. Create your shield, run a hard program. If your selected cpus have 0% usage / 0Mhz average frequency, i think you are good to go :) For my part, during a compilation of kernel with my shield, threads of core 1,2,3 are at 0 ~ 7 Mhz average.I don't know if the pic of 7Mhz is residual or real... -- Deldycke Quentin On 29 February 2016 at 11:56, Rokas Kupstys <[email protected]> wrote: > I tried nohz_full/rcu_nocbs in the past but it did not show any visible > improvement. I will try again i guess. I however had similar setup with > cset, although done rather manually. > > cset set -c 0-1 system > cset proc -m -f root -t system -k > > This essentially moves all tasks to cores 0 and 1. Since libvirt uses cpu > pinning it stays on cores assigned in xml. However it did not show much of > an improvement either. One thing that bothers me is kernel threads that > cant be moved from cores i would like to dedicate to VM. Any idea if there > is some kernel parameter which would prevent kernel thread creation on > certain cores but allow userspace to run on those same cores? This would be > perfect substitute for isolcpus param. We could clear cores from any tasks > when its needed and use them when vm is offline. Cant do that with isolcpus. > > Ill analyze that repo, seems interesting, thanks for the link. > > > On 2016.02.29 12:16, Quentin Deldycke wrote: > > Near as efficient as isolcpus, but can be used dynamically, during run: > > Use nohz_full / rcu_nocbs, to offload all rcu of your vm core to your > OS-only cores > Use cgroups, when you start vm, you keep only x core to the OS, when you > shut it down, let the OS have all cores. > > If vm is started and you need to have a power boost on linux, just use > "echo $$ | sudo tee /cgroups/cgroup.procs", and you will have all cores for > program run from this shell :) > > Linux only: all core, (but cores 1,2,3 are in nohz mode, offloaded by core > 0) > Linux + windows: 1 core to linux, 3 core to windows > Need boost on linux: the little command line for this shell > > > Example of cgroup usage: > https://github.com/qdel/scripts/tree/master/vfio/scripts => shieldbuild / > shieldbreak > > Which are called threw qemu hooks: > https://github.com/qdel/scripts/tree/master/vfio/hooks > > I do not configure my io, i let qemu manage. > > > Not one fun behavior: > While idle, i am completely still at ~1000us, > If i run a game, it goes down to a completely still 500us > > Example: http://b.qdel.fr/test.png > > Sorry for quality, vnc to 4k screen from 1080p all this... > > > -- > Deldycke Quentin > > > On 29 February 2016 at 10:55, Rokas Kupstys <[email protected]> wrote: > >> Yes currently i am actually booted with vanilla archlinux kernel, no >> NO_HZ and other stuff. >> >> Why does 2 core for the host is unacceptable? You plan to use it making >> hard workloads while gaming? >> >> Problem with isolcpus is that it exempts cores from linux cpu scheduler. >> This means even if VM is offline they will stand idle. While i dont do >> anything on host while gaming i do plenty when not gaming and just throwing >> away 6 cores of already disadvantaged AMD cpu is a real waste. >> >> This config is not good actually. >> >> Well.. It indeed looks bad on paper, however it is the only one that >> yields bearable DPC latency. I tried what you mentioned, various >> combinations. Pinning 0,2,4,6 cores to vm, 1,3 to emulator, 5,7 for io / >> 1,3,5,7 cores to vm, 0,2 to emulator, 4,6 for io / 0,1,2,3 cores to vm, 4,5 >> to emulator, 6,7 for io / 4,5,6,7 cores to vm, 0,1 to emulator, 2,3 for io. >> All of them yield terrible latency. >> >> Would be interesting to hear someone who has AMD build, how (if) he >> solved this. >> >> >> On 2016.02.29 11:10, Bronek Kozicki wrote: >> >> Two things you can improve, IMO >> >> * disable NO_HZ >> >> * use isolcpus to dedicate your pinned CPUs to guest only - this >> will also ensure they are not used for guest IO. >> >> B. >> >> On 29/02/2016 08:45, Rokas Kupstys wrote: >> >> >> >> >> Yesterday i figured out my latency problem. All things listed >> everywhere on internet failed. Last thing i tried was pinning one >> vcpu to two physical cores and it brought latency down. Now i have >> FX-8350 CPU which has shared FPU for each two cores so maybe thats >> why. With just this pinning latency now is most of the time just >> above 1000μs. However under load latency increases. I threw out >> iothreads and emulator pinning and it did not affect much. >> Superior latency could be achieved using isolcpus=2-7, however >> leaving just two cores to host is unacceptable. With that setting >> latency was around 500μs without load. Good part is that >> Battlefield3 no longer lags, although i observed increased loading >> times on textures compared to bare metal. Not so good part is that >> there still is minor sound skipping/cracking since latency is >> spiking up under load. That is very disappointing. I also tried >> performance with two VM cores pinned to 4 host cores - bf3 lagged >> enough to be unplayable. 3 vm cores pinned to 6 host cores was >> already playable but sound was still cracking. I noticed little >> difference between that and 4 vm cores pinned to 8 host cores. Be >> nice if sound could be cleaned up. If anyone have any ideas im all >> ears. Libvirt xml i use now: >> >> >> >> <vcpu >> placement='static'>4</vcpu> >> >> <cputune> >> >> <vcpupin vcpu='0' cpuset='0-1'/> >> >> <vcpupin vcpu='1' cpuset='2-3'/> >> >> <vcpupin vcpu='2' cpuset='4-5'/> >> >> <vcpupin vcpu='3' cpuset='6-7'/> >> >> </cputune> >> >> <features> >> >> <acpi/> >> >> <apic/> >> >> <pae/> >> >> <hap/> >> >> <viridian/> >> >> <hyperv> >> >> <relaxed state='on'/> >> >> <vapic state='on'/> >> >> <spinlocks state='on' retries='8191'/> >> >> </hyperv> >> >> <kvm> >> >> <hidden state='on'/> >> >> </kvm> >> >> <pvspinlock state='on'/> >> >> </features> >> >> <cpu mode='host-passthrough'> >> >> <topology sockets='1' cores='4' threads='1'/> >> >> </cpu> >> >> <clock offset='utc'> >> >> <timer name='rtc' tickpolicy='catchup'/> >> >> <timer name='pit' tickpolicy='delay'/> >> >> <timer name='hpet' present='no'/> >> >> <timer name='hypervclock' present='yes'/> >> >> </clock> >> >> >> >> >> Kernel configs >> >> CONFIG_NO_HZ_FULL=y >> >> CONFIG_RCU_NOCB_CPU_ALL=y >> >> CONFIG_HZ_1000=y >> >> CONFIG_HZ=1000 >> >> >> I am not convinced 1000 hz tickrate is needed. Default one (300) >> seems to perform equally as well from looking at latency charts. >> Did not get chance to test it with bf3 yet however. >> >> >> >> >> >> On 2016.01.12 11:12, thibaut noah >> wrote: >> >> >> >> >> >> >> >> [cut] >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> vfio-users mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/vfio-users >> >> > > > _______________________________________________ > vfio-users mailing > [email protected]https://www.redhat.com/mailman/listinfo/vfio-users > > > > _______________________________________________ > vfio-users mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/vfio-users > >
_______________________________________________ vfio-users mailing list [email protected] https://www.redhat.com/mailman/listinfo/vfio-users
