No worries at all about the length of the email, the details are highly
appreciated. You've given me lots to look into and consider.



On Sat, Mar 7, 2020 at 10:02 AM Strahil Nikolov <hunter86...@yahoo.com>
wrote:

> On March 7, 2020 1:12:58 PM GMT+02:00, Jayme <jay...@gmail.com> wrote:
> >Thanks again for the info. You’re probably right about the testing
> >method.
> >Though the reason I’m down this path in the first place is because I’m
> >seeing a problem in real world work loads. Many of my vms are used in
> >development environments where working with small files is common such
> >as
> >npm installs working with large node_module folders, ci/cd doing lots
> >of
> >mixed operations io and compute.
> >
> >I started testing some of these things by comparing side to side with a
> >vm
> >using same specs only difference being gluster vs nfs storage. Nfs
> >backed
> >storage is performing about 3x better real world.
> >
> >Gluster version is stock that comes with 4.3.7. I haven’t attempted
> >updating it outside of official ovirt updates.
> >
> >I’d like to see if I could improve it to handle my workloads better. I
> >also
> >understand that replication adds overhead.
> >
> >I do wonder how much difference in performance there would be with
> >replica
> >3 vs replica 3 arbiter. I’d assume arbiter setup would be faster but
> >perhaps not by a considerable difference.
> >
> >I will check into c states as well
> >
> >On Sat, Mar 7, 2020 at 2:52 AM Strahil Nikolov <hunter86...@yahoo.com>
> >wrote:
> >
> >> On March 7, 2020 1:09:37 AM GMT+02:00, Jayme <jay...@gmail.com>
> >wrote:
> >> >Strahil,
> >> >
> >> >Thanks for your suggestions. The config is pretty standard HCI setup
> >> >with
> >> >cockpit and hosts are oVirt node. XFS was handled by the deployment
> >> >automatically. The gluster volumes were optimized for virt store.
> >> >
> >> >I tried noop on the SSDs, that made zero difference in the tests I
> >was
> >> >running above. I took a look at the random-io-profile and it looks
> >like
> >> >it
> >> >really only sets vm.dirty_background_ratio = 2 & vm.dirty_ratio = 5
> >--
> >> >my
> >> >hosts already appear to have those sysctl values, and by default are
> >> >using virtual-host tuned profile.
> >> >
> >> >I'm curious what a test like "dd if=/dev/zero of=test2.img bs=512
> >> >count=1000 oflag=dsync" on one of your VMs would show for results?
> >> >
> >> >I haven't done much with gluster profiling but will take a look and
> >see
> >> >if
> >> >I can make sense of it. Otherwise, the setup is pretty stock oVirt
> >HCI
> >> >deployment with SSD backed storage and 10Gbe storage network.  I'm
> >not
> >> >coming anywhere close to maxing network throughput.
> >> >
> >> >The NFS export I was testing was an export from a local server
> >> >exporting a
> >> >single SSD (same type as in the oVirt hosts).
> >> >
> >> >I might end up switching storage to NFS and ditching gluster if
> >> >performance
> >> >is really this much better...
> >> >
> >> >
> >> >On Fri, Mar 6, 2020 at 5:06 PM Strahil Nikolov
> ><hunter86...@yahoo.com>
> >> >wrote:
> >> >
> >> >> On March 6, 2020 6:02:03 PM GMT+02:00, Jayme <jay...@gmail.com>
> >> >wrote:
> >> >> >I have 3 server HCI with Gluster replica 3 storage (10GBe and SSD
> >> >> >disks).
> >> >> >Small file performance inner-vm is pretty terrible compared to a
> >> >> >similar
> >> >> >spec'ed VM using NFS mount (10GBe network, SSD disk)
> >> >> >
> >> >> >VM with gluster storage:
> >> >> >
> >> >> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync
> >> >> >1000+0 records in
> >> >> >1000+0 records out
> >> >> >512000 bytes (512 kB) copied, 53.9616 s, 9.5 kB/s
> >> >> >
> >> >> >VM with NFS:
> >> >> >
> >> >> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync
> >> >> >1000+0 records in
> >> >> >1000+0 records out
> >> >> >512000 bytes (512 kB) copied, 2.20059 s, 233 kB/s
> >> >> >
> >> >> >This is a very big difference, 2 seconds to copy 1000 files on
> >NFS
> >> >VM
> >> >> >VS 53
> >> >> >seconds on the other.
> >> >> >
> >> >> >Aside from enabling libgfapi is there anything I can tune on the
> >> >> >gluster or
> >> >> >VM side to improve small file performance? I have seen some
> >guides
> >> >by
> >> >> >Redhat in regards to small file performance but I'm not sure
> >what/if
> >> >> >any of
> >> >> >it applies to oVirt's implementation of gluster in HCI.
> >> >>
> >> >> You can use the rhgs-random-io tuned  profile from
> >> >>
> >> >
> >>
> >
> ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.4.2.0-1.el7rhgs.src.rpm
> >> >> and try with that on your hosts.
> >> >> In my case, I have  modified  it so it's a mixture between
> >> >rhgs-random-io
> >> >> and the profile for Virtualization Host.
> >> >>
> >> >> Also,ensure that your bricks are  using XFS with relatime/noatime
> >> >mount
> >> >> option and your scheduler for the SSDs is either  'noop' or 'none'
> >> >.The
> >> >> default  I/O scheduler for RHEL7 is deadline which is giving
> >> >preference to
> >> >> reads and  your  workload  is  definitely 'write'.
> >> >>
> >> >> Ensure that the virt settings are  enabled for your gluster
> >volumes:
> >> >> 'gluster volume set <volname> group virt'
> >> >>
> >> >> Also, are you running  on fully allocated disks for the VM or you
> >> >started
> >> >> thin ?
> >> >> I'm asking as creation of new shards  at gluster  level is a slow
> >> >task.
> >> >>
> >> >> Have you checked  gluster  profiling the volume?  It can clarify
> >what
> >> >is
> >> >> going on.
> >> >>
> >> >>
> >> >> Also are you comparing apples to apples ?
> >> >> For example, 1 ssd  mounted  and exported  as NFS and a replica 3
> >> >volume
> >> >> of the same type of ssd ? If not,  the NFS can have more iops due
> >to
> >> >> multiple disks behind it, while Gluster has to write the same
> >thing
> >> >on all
> >> >> nodes.
> >> >>
> >> >> Best Regards,
> >> >> Strahil Nikolov
> >> >>
> >> >>
> >>
> >> Hi Jayme,
> >>
> >>
> >> My test are not quite good ,as I have a different setup:
> >>
> >> NVME - VDO - 4 thin LVs -XFS - 4  Gluster  volumes (replica 2 arbiter
> >1)
> >> - 4  storage domains  - striped  LV in each VM
> >>
> >> RHEL7 VM (fully stock):
> >> [root@node1 ~]# dd if=/dev/zero of=test2.img bs=512 count=1000
> >oflag=dsync
> >> 1000+0 records in
> >> 1000+0 records out
> >> 512000 bytes (512 kB) copied, 19.8195 s, 25.8 kB/s
> >> [root@node1 ~]#
> >>
> >> Brick:
> >> [root@ovirt1 data_fast]# dd if=/dev/zero of=test2.img bs=512
> >count=1000
> >> oflag=dsync
> >> 1000+0 records in
> >> 1000+0 records out
> >> 512000 bytes (512 kB) copied, 1.41192 s, 363 kB/s
> >>
> >> As I use VDO with compression (on 1/4 of the NVMe) - I cannot expect
> >any
> >> performance from it.
> >>
> >>
> >> Is your app really using dsync ? I have seen many times that
> >performance
> >> testing with the wrong tools/tests cause more  trouble than it
> >should.
> >>
> >> I would recommend you to test with a real workload before deciding to
> >> change the architecture.
> >>
> >> I forgot to mention that you need to disable c states for your
> >systems if
> >> you are chasing performance.
> >> Run a gluster profile while you run real workload in your VMs and
> >then
> >> provide that for analysis.
> >>
> >> Which version of Gluster are you using ?
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
>
> Hm...
> Then you do have a real workload scenario - pick one of the most often
> used tasks and use it's time of completion for reference.
> Synthetic benchmarking is not good.
>
> As far as I know oVirt is actually running on gluster v6.X .
> @Sandro,
> Can you hint us the highest supported gluster version on oVirt  ? I'm
> running v7.0, so I'm little bit off the track.
>
> Jayme,
>
> Next steps are to check:
> 1.  Did you disable cstates - there are very good articles for RHEL/CentOS
> 7
> 2.  Check firmware  of your HCI nodes - I've seen numerous network/SAN
> issues due to old firmware including stucked processes
> 3. Check the articles for RHV and hugepages . If your VMs are memory
> dynamic and lots of RAM is  needed -> hugepages will bring more performance.
> Second , transparent huge pages  must be disabled.
> 4.  Create a High Performance  VM for testing purposes  with fully
> allocated disks
> 5. Check if 'noatime' or  'relatime'  is set for the bricks. If selinux is
> in enforcing mode (I highly recommend that), you can use mount option
> 'system_u:object_r:glusterd_brick_t:s0'  which will cause the kernel to
> reduce  lookups to check the SELINUX context of  all  files in the brick  -
> and increasing the performance.
>
> 6. Consider switching to 'noop'/'none' or tuning 'deadline' I/O scheduler
> to match your needs
>
> 7.  Create a  gluster profile during the VM(step 4) is being tested , as
> if is needed.
>
> 8. Consider  using 'Pass-through  host cpu' which is enabled in UI via ->
> VM-> edit -> Host -> Start on specific host -> select all hosts with the
> same cpu ->  allow  manual and automatic migration ->  OK
> This mode allows  all instructions on the Host CPU to be available  on the
> guest, greatly increasing performance  for a lot of software.
>
>
> The difference between 'replica 3'  and 'replica 3 arbiter 1' (old name
> was  'replica 2 arbiter 1' but it means  the same)  is the fact that the
> arbitrated volume requiress  less bandwidth (due  to the fact that the
> files  on the arbiter  has 0 bytes  of data)  and stores  only metadata to
> prevent splitbrain.
> Drawbacks of the arbiter is that you have only 2 sources  to read from,
> while replica 3  provides three sources  to read from.
> With glusterd 2.0 ( I think it was introduced in gluster v7 ) the arbiter
> doesn't need to be locally (which means higher lattencies are no longer an
> issue), and is only needed when one of data bricks is needed.Still, the
> remote arbiter is too new for prod.
>
> Next: You can consider clusterized  2-node NFS Ganesha (with quorum device
> for the third vote) as  an NFS source. The good thing about  NFS Ganes is
> the primary focus from the Gluster  community  and it uses libgfapi to
> connect  to the backend  (replica  volume).
>
> I  think it's enough for now  , but  I guess  other  stuff could come to
> my mind at later  stage.
>
> Edit: This e-mail is way longer than I initially thought to be.Sorry about
> that.
>
>
> Best Regards,
> Strahil Nikolov
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NLRFQAONGQCUYBTP7OJH43CW3P55E6LD/

Reply via email to