[Bug 1838575] Re: passthrough devices cause >17min boot delay

2021-10-07 Thread Colin Ian King
** Changed in: linux (Ubuntu) Assignee: Colin Ian King (colin-king) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1838575 Title: passthrough devices cause >17min boot delay To

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-10-23 Thread Christian Ehrhardt 
As outlined in the past conceptually there is nothing that qemu can do. The kernel can in theory get memory zeroing to become concurrent and thereby scale with CPUs but that is an effort that was already started twice and didn't get into the kernel yet. Workarounds are known to shrink that size

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-30 Thread Christian Ehrhardt 
As qemu (seems) to be unable to do much I'll set it to triaged (we understand what is going on) and low (can't do much). ** Changed in: qemu (Ubuntu) Status: Incomplete => Triaged ** Changed in: qemu (Ubuntu) Importance: Medium => Low -- You received this bug notification because you

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-29 Thread Christian Ehrhardt 
I modified the kernel to have a few functions non-inlined to be better tracable: vfio_dma_do_map vfio_dma_do_unmap mutex_lock mutex_unlock kzalloc vfio_link_dma vfio_pin_map_dma vfio_pin_pages_remote vfio_iommu_map Then run tracing on this load with limited to the functions in my focus: $ sudo tra

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-29 Thread Christian Ehrhardt 
(systemtap) probe module("vfio_iommu_type1").function("vfio_iommu_type1_ioctl") { printf("New vfio_iommu_type1_ioctl\n"); start_stopwatch("vfioioctl"); } probe module("vfio_iommu_type1").function("vfio_iommu_type1_ioctl").return { timer=read_stopwatch_ns("vfioioctl") printf("Complet

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-29 Thread Christian Ehrhardt 
This is a silly but useful distribution check with log10 of the allocation sizes: Fast: 108 3 1293 4 12133 5 113330 6 27794 7 1119 8 Slow: 194 3 1738 4 17375 5 143411 6 55 7 3 8 I got no warnings about missed call

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-29 Thread Christian Ehrhardt 
The iommu is locked in there early and the iommu element is what is passed from userspace. That represents the vfio container for this device (container->fd) qemu: if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 kernel: static long vfio_iommu_type1_ioctl(void *iommu_data, unsigne

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
Each qemu (version) is slightly different in the road to this, but then seems to behave. This one is slightly better to get "in front" of the slow call to map all the memory. $ virsh nodedev-detach pci__21_00_1 --driver vfio $ gdb /usr/bin/qemu-system-x86_64 (gdb) b vfio_dma_map (gdb) command

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
I could next build a test kernel with some debug around the vfio iommu dma map to check how time below that call is spent. I'm sure that data already is hidden in some of my trace data, but to eventually change/experiment I need to build one anyway. I expect anyway to summarize and go into a dis

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
Reference: this is the call from qemu that I think we see above (on x86) is at [1]. If this time the assumption is correct the kernel place would be at vfio_iommu_type1_ioctl. For debugging: $ gdb qemu/x86_64-softmmu/qemu-system-x86_64 (gdb) catch syscall 16 (gdb) run -m 131072 -smp 1 -no-user-co

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
Many ioctls (as expected) but they are all fast and match what we knew from strace. Thread 1 "qemu-system-x86" hit Catchpoint 1 (call to syscall ioctl), 0x772fae0b in ioctl () at ../sysdeps/unix/syscall-template.S:78 78 in ../sysdeps/unix/syscall-template.S (gdb) bt #0 0x772

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
The above was through libvirt, doing that directly in qemu now to throw it into debugging more easily: $ virsh nodedev-detach pci__21_00_1 --driver vfio $ qemu/x86_64-softmmu/qemu-system-x86_64 -name guest=test-vfio-slowness -m 131072 -smp 1 -no-user-config -drive file=/var/lib/uvtool/libvirt

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
Just when I thought I understood the pattern. Sixth run (again kill and restart) 6384 9.826097 <... ioctl resumed> , 0x7ffcc8ed6e20) = 0 <19.495688> So for now lets summarize that it varies :-/ But it always seems slow. -- You received this bug notification because you are a member of Ubu

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
On x86 this looks pretty similar and at the place we have seen before: 45397 0.73 readlink("/sys/bus/pci/devices/:21:00.1/iommu_group", "../../../../kernel/iommu_groups/"..., 4096) = 34 <0.20> 45397 0.53 openat(AT_FDCWD, "/dev/vfio/45", O_RDWR|O_CLOEXEC) = 31 <0.33>

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-27 Thread Christian Ehrhardt 
I built qemu head from git $ export CFLAGS="-O0 -g" $ ./configure --disable-user --disable-linux-user --disable-docs --disable-guest-agent --disable-sdl --disable-gtk --disable-vnc --disable-xen --disable-brlapi --enable-fdt --disable-bluez --disable-vde --disable-rbd --disable-libiscsi --disab

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-26 Thread Christian Ehrhardt 
Hmm, with strace showing almost a hang on a single of those ioctl calls you'D think that is easy to spot :-/ But this isn't as clear as expected: sudo trace-cmd record -p function_graph -l vfio_pci_ioctl -O graph-time Disable all but 1 CPUs to have less concurrency in the trace. => Not much bet

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-22 Thread Christian Ehrhardt 
On this platform strace still confirms the same paths: And perf as well (slight arch differences, but still mem setup). 46.85% [kernel] [k] lruvec_lru_size 16.89% [kernel] [k] clear_user_page 5.74% [kernel] [k] inacti

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-22 Thread Christian Ehrhardt 
As assumed this really seems to be cross arch and for all sizes. Here 16 PU, 128G on ppc64el: #1: 54 seconds #2: 7 seconds #3: 23 seconds Upped to 192GB this has: #1: 75 seconds #2: 5 seconds #3: 23 seconds As a note, in this case I checked there are ~7 seconds before it does into thi

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
You can do so even per-size via e.g. /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages As discussed the later the allocation the higher the chance to fail, so re-check the sysfs file after each change if it actually got that much memory. The default size is only a boot time parameter. Bu

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Colin Ian King
** Changed in: linux (Ubuntu) Importance: Undecided => Medium ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Colin Ian King
Naive question: can we tweak the hugepage file settings at run time via /proc/sys/vm/nr_hugepages and not require the kernel parameters? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1838575 Title:

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
A trace into the early phase (first some other init with ~16 cpus) then the long phase of 1 thread blocking. So we will see it "enter" the slow phase as well as iterating in it. There seem to be two phases one around alloc_pages_current and then one around slot_rmap_walk_next/rmap_get_first. I'd

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
Summary: As I mentioned before (on the other bug that I referred). The problem is that with a PT device it needs to reset and map the VDIO devices. So with >0 PT devices attached it needs an init that scales with memory size of the guest (see my fast results with PT but small guest memory). As I

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
I mentioned in the last discussion around this, that the one thing that could be done is to make this single thread mem-init a multi thread action (in the kernel). I doubt that we can make it omit the initialization. Even though it is faster, even the 1G Huge Page setup could be more efficient. To

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
I was bumping up to the config you had (but with one PT device). - Host phys bits machine type for larger mappings - more CPUS 1->32 Adding/removing a PT device in the configs above doesn't change a lot. As assumed none of these increased the time tremendously. Then I went to bump up the memory

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
Now to gain a good result, lets use 1G Huge Pages. Kernel cmdline: default_hugepagesz=1G hugepagesz=1G hugepages=1210 Gives: HugePages_Total:1210 HugePages_Free: 1210 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize:1048576 kB Guest config extra: Slightly ch

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
IMHO this is the same we discussed around March this year => https://bugs.launchpad.net/nvidia-dgx-2/+bug/1818891/comments/5. In an associated mail thread we even discussed the pro/con of changes like We see above this is about (transparent huge page) setup for all the memory. We can disable h

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-08-01 Thread Christian Ehrhardt 
T3: use 1.2 TB with one PT device - THP=off #1: 476 seconds #2: 31 seconds #3: 20 seconds ubuntu@akis:~$ echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled never ubuntu@akis:~$ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] Samples: 88K of event 'cyc

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-07-31 Thread Christian Ehrhardt 
Since I knew memory often is more painful - start with 512MB, 1CPU, 1 PCI Passthrough Note: I installed debug symbols for glibc and qemu On init I find initially the guests CPU thread rather busy (well, booting up) 80.66% CPU 0/KVM[kernel] Passthrough is successful - lspci from guest:

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-07-31 Thread Christian Ehrhardt 
I have seen slow boots before, but it scaled with the amount of devices and memory reaching like ~5min (for >1TB mem init) and ~2.5min for 16 device pass-through. Those times (the ones I mentioned) Nvidia has seen and sort of accepted. I discussed with Anish about how using HugePages help to reduc

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-07-31 Thread Rafael David Tinoco
Attached file is already the stdio, sorry for prev message. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1838575 Title: passthrough devices cause >17min boot delay To manage notifications about th

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-07-31 Thread Rafael David Tinoco
> > sudo perf record -a -F $((250*100)) -e cycles:u -- > > /usr/bin/qemu-system-x86_64 -name guest="guest" > > I instead attached perf to the qemu process after it was spawned by > libvirt so I didn't have to worry about producing a working qemu > cmdline. I let it run for several seconds whil

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-07-31 Thread Rafael David Tinoco
** Changed in: qemu (Ubuntu) Importance: Undecided => Medium ** Changed in: qemu (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) ** Changed in: qemu (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu B

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-07-31 Thread dann frazier
** Attachment added: "sample xml w/ devices passed through" https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1838575/+attachment/5280232/+files/config5-new.xml -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchp

[Bug 1838575] Re: passthrough devices cause >17min boot delay

2019-07-31 Thread dann frazier
Here's a perf report of a 'sudo perf record -p 6949 -a -F 25000 -e cycles', after libvirt spawned qemu w/ pid 6949. ** Attachment added: "perf.report" https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1838575/+attachment/5280233/+files/perf.report -- You received this bug notification beca