My apologies for duplicating posts - they initially got stuck and I really
wanted to reach out to the group with the query to try and discover
unknowns.

Passing through the whole pci nvme device is fine, because the VM is locked
to the host due to the gpu pci pass through anyway. I will implement a
mechanism to protect the data on the single disk in both cases.

I'm not exactly sure what type of disk writes are being used, it's a
learning model being trained by the gpu's. I'll try and find out more.
After I finished the config I searched online to get some basic throughput
test for the disk. Here's the commands and results taken at that time
(below).

*Test on host with "local storage" (using a disk image on the nvme drive)*
# dd if=/dev/zero of=test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.92561 s, 558 MB/s

*Test on host with nvme pass through*

# dd if=/dev/zero of=/mnt/nvme/tmpflag bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.42554 s, 753 MB/s

In both cases the nvme was used as a mounted additional drive. The OS is
booting on different disk image, which is located in a Storage Domain over
iscsi.

I'm not anything close to a storage expert but I understand the gist of the
descriptions I find when searching about the dd parameters. Since it looks
like both configurations are going to be OK for longevity I'll aim to test
both scenarios live and choose the one which gives the best result for the
workload.

Thanks a lot for your reply and help :)

Tony Pearce


On Fri, 6 Aug 2021 at 03:28, Thomas Hoberg <tho...@hoberg.net> wrote:

> You gave some different details in your other post, but here you mention
> use of GPU pass through.
>
> Any pass through will lose you the live migration ability, but
> unfortunately with GPUs, that's just how it is these days: while those
> could in theory be moved when the GPUs were identical (because their amount
> of state is limited to VRAM size), the support code (and kernel
> interfaces?) simply does not exist today.
>
> In that scenario a pass-through storage device won't lose you anything you
> still have.
>
> But you'll have to remember that PCI pass-through works only at the
> granularity of a whole PCI device. That's fine with (an entire) NVMe,
> because these combine "disks" and "controller", not so fine with individual
> disks on a SATA or SCSI controller. And you certainly can't pass through
> partitions!
>
> It gets to be really fun with cascaded USB and I haven't really tried
> Thunderbolt either (mostly because I have given up on CentOS8/oVirt 4.4)
>
> But generally the VirtIOSCSI interface imposes so little overhead, it only
> becomes noticeable when you run massive amounts of tiny I/O on NVMe. Play
> with the block sizes and the sync flag on your DD tests to see the
> differences, I've had lots of fun (and some disillusions) with that, but
> mostly with Gluster storage over TCP/IP on Ethernet.
>
> If that's really where your bottlenecks are coming from, you may want to
> look at architecture rather than pass-through.
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/6CJPD6TKL4M44O77RECZYTNVNSSMXJRU/
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MKMN7YSEWX4EREE2H5OEOHH6455CH7FX/

Reply via email to