No worries at all about the length of the email, the details are highly appreciated. You've given me lots to look into and consider.
On Sat, Mar 7, 2020 at 10:02 AM Strahil Nikolov <hunter86...@yahoo.com> wrote: > On March 7, 2020 1:12:58 PM GMT+02:00, Jayme <jay...@gmail.com> wrote: > >Thanks again for the info. You’re probably right about the testing > >method. > >Though the reason I’m down this path in the first place is because I’m > >seeing a problem in real world work loads. Many of my vms are used in > >development environments where working with small files is common such > >as > >npm installs working with large node_module folders, ci/cd doing lots > >of > >mixed operations io and compute. > > > >I started testing some of these things by comparing side to side with a > >vm > >using same specs only difference being gluster vs nfs storage. Nfs > >backed > >storage is performing about 3x better real world. > > > >Gluster version is stock that comes with 4.3.7. I haven’t attempted > >updating it outside of official ovirt updates. > > > >I’d like to see if I could improve it to handle my workloads better. I > >also > >understand that replication adds overhead. > > > >I do wonder how much difference in performance there would be with > >replica > >3 vs replica 3 arbiter. I’d assume arbiter setup would be faster but > >perhaps not by a considerable difference. > > > >I will check into c states as well > > > >On Sat, Mar 7, 2020 at 2:52 AM Strahil Nikolov <hunter86...@yahoo.com> > >wrote: > > > >> On March 7, 2020 1:09:37 AM GMT+02:00, Jayme <jay...@gmail.com> > >wrote: > >> >Strahil, > >> > > >> >Thanks for your suggestions. The config is pretty standard HCI setup > >> >with > >> >cockpit and hosts are oVirt node. XFS was handled by the deployment > >> >automatically. The gluster volumes were optimized for virt store. > >> > > >> >I tried noop on the SSDs, that made zero difference in the tests I > >was > >> >running above. I took a look at the random-io-profile and it looks > >like > >> >it > >> >really only sets vm.dirty_background_ratio = 2 & vm.dirty_ratio = 5 > >-- > >> >my > >> >hosts already appear to have those sysctl values, and by default are > >> >using virtual-host tuned profile. > >> > > >> >I'm curious what a test like "dd if=/dev/zero of=test2.img bs=512 > >> >count=1000 oflag=dsync" on one of your VMs would show for results? > >> > > >> >I haven't done much with gluster profiling but will take a look and > >see > >> >if > >> >I can make sense of it. Otherwise, the setup is pretty stock oVirt > >HCI > >> >deployment with SSD backed storage and 10Gbe storage network. I'm > >not > >> >coming anywhere close to maxing network throughput. > >> > > >> >The NFS export I was testing was an export from a local server > >> >exporting a > >> >single SSD (same type as in the oVirt hosts). > >> > > >> >I might end up switching storage to NFS and ditching gluster if > >> >performance > >> >is really this much better... > >> > > >> > > >> >On Fri, Mar 6, 2020 at 5:06 PM Strahil Nikolov > ><hunter86...@yahoo.com> > >> >wrote: > >> > > >> >> On March 6, 2020 6:02:03 PM GMT+02:00, Jayme <jay...@gmail.com> > >> >wrote: > >> >> >I have 3 server HCI with Gluster replica 3 storage (10GBe and SSD > >> >> >disks). > >> >> >Small file performance inner-vm is pretty terrible compared to a > >> >> >similar > >> >> >spec'ed VM using NFS mount (10GBe network, SSD disk) > >> >> > > >> >> >VM with gluster storage: > >> >> > > >> >> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync > >> >> >1000+0 records in > >> >> >1000+0 records out > >> >> >512000 bytes (512 kB) copied, 53.9616 s, 9.5 kB/s > >> >> > > >> >> >VM with NFS: > >> >> > > >> >> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync > >> >> >1000+0 records in > >> >> >1000+0 records out > >> >> >512000 bytes (512 kB) copied, 2.20059 s, 233 kB/s > >> >> > > >> >> >This is a very big difference, 2 seconds to copy 1000 files on > >NFS > >> >VM > >> >> >VS 53 > >> >> >seconds on the other. > >> >> > > >> >> >Aside from enabling libgfapi is there anything I can tune on the > >> >> >gluster or > >> >> >VM side to improve small file performance? I have seen some > >guides > >> >by > >> >> >Redhat in regards to small file performance but I'm not sure > >what/if > >> >> >any of > >> >> >it applies to oVirt's implementation of gluster in HCI. > >> >> > >> >> You can use the rhgs-random-io tuned profile from > >> >> > >> > > >> > > > ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.4.2.0-1.el7rhgs.src.rpm > >> >> and try with that on your hosts. > >> >> In my case, I have modified it so it's a mixture between > >> >rhgs-random-io > >> >> and the profile for Virtualization Host. > >> >> > >> >> Also,ensure that your bricks are using XFS with relatime/noatime > >> >mount > >> >> option and your scheduler for the SSDs is either 'noop' or 'none' > >> >.The > >> >> default I/O scheduler for RHEL7 is deadline which is giving > >> >preference to > >> >> reads and your workload is definitely 'write'. > >> >> > >> >> Ensure that the virt settings are enabled for your gluster > >volumes: > >> >> 'gluster volume set <volname> group virt' > >> >> > >> >> Also, are you running on fully allocated disks for the VM or you > >> >started > >> >> thin ? > >> >> I'm asking as creation of new shards at gluster level is a slow > >> >task. > >> >> > >> >> Have you checked gluster profiling the volume? It can clarify > >what > >> >is > >> >> going on. > >> >> > >> >> > >> >> Also are you comparing apples to apples ? > >> >> For example, 1 ssd mounted and exported as NFS and a replica 3 > >> >volume > >> >> of the same type of ssd ? If not, the NFS can have more iops due > >to > >> >> multiple disks behind it, while Gluster has to write the same > >thing > >> >on all > >> >> nodes. > >> >> > >> >> Best Regards, > >> >> Strahil Nikolov > >> >> > >> >> > >> > >> Hi Jayme, > >> > >> > >> My test are not quite good ,as I have a different setup: > >> > >> NVME - VDO - 4 thin LVs -XFS - 4 Gluster volumes (replica 2 arbiter > >1) > >> - 4 storage domains - striped LV in each VM > >> > >> RHEL7 VM (fully stock): > >> [root@node1 ~]# dd if=/dev/zero of=test2.img bs=512 count=1000 > >oflag=dsync > >> 1000+0 records in > >> 1000+0 records out > >> 512000 bytes (512 kB) copied, 19.8195 s, 25.8 kB/s > >> [root@node1 ~]# > >> > >> Brick: > >> [root@ovirt1 data_fast]# dd if=/dev/zero of=test2.img bs=512 > >count=1000 > >> oflag=dsync > >> 1000+0 records in > >> 1000+0 records out > >> 512000 bytes (512 kB) copied, 1.41192 s, 363 kB/s > >> > >> As I use VDO with compression (on 1/4 of the NVMe) - I cannot expect > >any > >> performance from it. > >> > >> > >> Is your app really using dsync ? I have seen many times that > >performance > >> testing with the wrong tools/tests cause more trouble than it > >should. > >> > >> I would recommend you to test with a real workload before deciding to > >> change the architecture. > >> > >> I forgot to mention that you need to disable c states for your > >systems if > >> you are chasing performance. > >> Run a gluster profile while you run real workload in your VMs and > >then > >> provide that for analysis. > >> > >> Which version of Gluster are you using ? > >> > >> Best Regards, > >> Strahil Nikolov > >> > > Hm... > Then you do have a real workload scenario - pick one of the most often > used tasks and use it's time of completion for reference. > Synthetic benchmarking is not good. > > As far as I know oVirt is actually running on gluster v6.X . > @Sandro, > Can you hint us the highest supported gluster version on oVirt ? I'm > running v7.0, so I'm little bit off the track. > > Jayme, > > Next steps are to check: > 1. Did you disable cstates - there are very good articles for RHEL/CentOS > 7 > 2. Check firmware of your HCI nodes - I've seen numerous network/SAN > issues due to old firmware including stucked processes > 3. Check the articles for RHV and hugepages . If your VMs are memory > dynamic and lots of RAM is needed -> hugepages will bring more performance. > Second , transparent huge pages must be disabled. > 4. Create a High Performance VM for testing purposes with fully > allocated disks > 5. Check if 'noatime' or 'relatime' is set for the bricks. If selinux is > in enforcing mode (I highly recommend that), you can use mount option > 'system_u:object_r:glusterd_brick_t:s0' which will cause the kernel to > reduce lookups to check the SELINUX context of all files in the brick - > and increasing the performance. > > 6. Consider switching to 'noop'/'none' or tuning 'deadline' I/O scheduler > to match your needs > > 7. Create a gluster profile during the VM(step 4) is being tested , as > if is needed. > > 8. Consider using 'Pass-through host cpu' which is enabled in UI via -> > VM-> edit -> Host -> Start on specific host -> select all hosts with the > same cpu -> allow manual and automatic migration -> OK > This mode allows all instructions on the Host CPU to be available on the > guest, greatly increasing performance for a lot of software. > > > The difference between 'replica 3' and 'replica 3 arbiter 1' (old name > was 'replica 2 arbiter 1' but it means the same) is the fact that the > arbitrated volume requiress less bandwidth (due to the fact that the > files on the arbiter has 0 bytes of data) and stores only metadata to > prevent splitbrain. > Drawbacks of the arbiter is that you have only 2 sources to read from, > while replica 3 provides three sources to read from. > With glusterd 2.0 ( I think it was introduced in gluster v7 ) the arbiter > doesn't need to be locally (which means higher lattencies are no longer an > issue), and is only needed when one of data bricks is needed.Still, the > remote arbiter is too new for prod. > > Next: You can consider clusterized 2-node NFS Ganesha (with quorum device > for the third vote) as an NFS source. The good thing about NFS Ganes is > the primary focus from the Gluster community and it uses libgfapi to > connect to the backend (replica volume). > > I think it's enough for now , but I guess other stuff could come to > my mind at later stage. > > Edit: This e-mail is way longer than I initially thought to be.Sorry about > that. > > > Best Regards, > Strahil Nikolov >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NLRFQAONGQCUYBTP7OJH43CW3P55E6LD/