Thanks again for the info. You’re probably right about the testing method. Though the reason I’m down this path in the first place is because I’m seeing a problem in real world work loads. Many of my vms are used in development environments where working with small files is common such as npm installs working with large node_module folders, ci/cd doing lots of mixed operations io and compute.
I started testing some of these things by comparing side to side with a vm using same specs only difference being gluster vs nfs storage. Nfs backed storage is performing about 3x better real world. Gluster version is stock that comes with 4.3.7. I haven’t attempted updating it outside of official ovirt updates. I’d like to see if I could improve it to handle my workloads better. I also understand that replication adds overhead. I do wonder how much difference in performance there would be with replica 3 vs replica 3 arbiter. I’d assume arbiter setup would be faster but perhaps not by a considerable difference. I will check into c states as well On Sat, Mar 7, 2020 at 2:52 AM Strahil Nikolov <hunter86...@yahoo.com> wrote: > On March 7, 2020 1:09:37 AM GMT+02:00, Jayme <jay...@gmail.com> wrote: > >Strahil, > > > >Thanks for your suggestions. The config is pretty standard HCI setup > >with > >cockpit and hosts are oVirt node. XFS was handled by the deployment > >automatically. The gluster volumes were optimized for virt store. > > > >I tried noop on the SSDs, that made zero difference in the tests I was > >running above. I took a look at the random-io-profile and it looks like > >it > >really only sets vm.dirty_background_ratio = 2 & vm.dirty_ratio = 5 -- > >my > >hosts already appear to have those sysctl values, and by default are > >using virtual-host tuned profile. > > > >I'm curious what a test like "dd if=/dev/zero of=test2.img bs=512 > >count=1000 oflag=dsync" on one of your VMs would show for results? > > > >I haven't done much with gluster profiling but will take a look and see > >if > >I can make sense of it. Otherwise, the setup is pretty stock oVirt HCI > >deployment with SSD backed storage and 10Gbe storage network. I'm not > >coming anywhere close to maxing network throughput. > > > >The NFS export I was testing was an export from a local server > >exporting a > >single SSD (same type as in the oVirt hosts). > > > >I might end up switching storage to NFS and ditching gluster if > >performance > >is really this much better... > > > > > >On Fri, Mar 6, 2020 at 5:06 PM Strahil Nikolov <hunter86...@yahoo.com> > >wrote: > > > >> On March 6, 2020 6:02:03 PM GMT+02:00, Jayme <jay...@gmail.com> > >wrote: > >> >I have 3 server HCI with Gluster replica 3 storage (10GBe and SSD > >> >disks). > >> >Small file performance inner-vm is pretty terrible compared to a > >> >similar > >> >spec'ed VM using NFS mount (10GBe network, SSD disk) > >> > > >> >VM with gluster storage: > >> > > >> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync > >> >1000+0 records in > >> >1000+0 records out > >> >512000 bytes (512 kB) copied, 53.9616 s, 9.5 kB/s > >> > > >> >VM with NFS: > >> > > >> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync > >> >1000+0 records in > >> >1000+0 records out > >> >512000 bytes (512 kB) copied, 2.20059 s, 233 kB/s > >> > > >> >This is a very big difference, 2 seconds to copy 1000 files on NFS > >VM > >> >VS 53 > >> >seconds on the other. > >> > > >> >Aside from enabling libgfapi is there anything I can tune on the > >> >gluster or > >> >VM side to improve small file performance? I have seen some guides > >by > >> >Redhat in regards to small file performance but I'm not sure what/if > >> >any of > >> >it applies to oVirt's implementation of gluster in HCI. > >> > >> You can use the rhgs-random-io tuned profile from > >> > > > ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.4.2.0-1.el7rhgs.src.rpm > >> and try with that on your hosts. > >> In my case, I have modified it so it's a mixture between > >rhgs-random-io > >> and the profile for Virtualization Host. > >> > >> Also,ensure that your bricks are using XFS with relatime/noatime > >mount > >> option and your scheduler for the SSDs is either 'noop' or 'none' > >.The > >> default I/O scheduler for RHEL7 is deadline which is giving > >preference to > >> reads and your workload is definitely 'write'. > >> > >> Ensure that the virt settings are enabled for your gluster volumes: > >> 'gluster volume set <volname> group virt' > >> > >> Also, are you running on fully allocated disks for the VM or you > >started > >> thin ? > >> I'm asking as creation of new shards at gluster level is a slow > >task. > >> > >> Have you checked gluster profiling the volume? It can clarify what > >is > >> going on. > >> > >> > >> Also are you comparing apples to apples ? > >> For example, 1 ssd mounted and exported as NFS and a replica 3 > >volume > >> of the same type of ssd ? If not, the NFS can have more iops due to > >> multiple disks behind it, while Gluster has to write the same thing > >on all > >> nodes. > >> > >> Best Regards, > >> Strahil Nikolov > >> > >> > > Hi Jayme, > > > My test are not quite good ,as I have a different setup: > > NVME - VDO - 4 thin LVs -XFS - 4 Gluster volumes (replica 2 arbiter 1) > - 4 storage domains - striped LV in each VM > > RHEL7 VM (fully stock): > [root@node1 ~]# dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync > 1000+0 records in > 1000+0 records out > 512000 bytes (512 kB) copied, 19.8195 s, 25.8 kB/s > [root@node1 ~]# > > Brick: > [root@ovirt1 data_fast]# dd if=/dev/zero of=test2.img bs=512 count=1000 > oflag=dsync > 1000+0 records in > 1000+0 records out > 512000 bytes (512 kB) copied, 1.41192 s, 363 kB/s > > As I use VDO with compression (on 1/4 of the NVMe) - I cannot expect any > performance from it. > > > Is your app really using dsync ? I have seen many times that performance > testing with the wrong tools/tests cause more trouble than it should. > > I would recommend you to test with a real workload before deciding to > change the architecture. > > I forgot to mention that you need to disable c states for your systems if > you are chasing performance. > Run a gluster profile while you run real workload in your VMs and then > provide that for analysis. > > Which version of Gluster are you using ? > > Best Regards, > Strahil Nikolov >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YRGEX37SBEUWNONYSV4KVTOVARHG53LU/