[ovirt-users] Re: Poor I/O Performance (again...)

Leo David Mon, 15 Apr 2019 10:10:00 -0700

Thank you Alex !
I will try these performance settings.
If someone from the dev guys could validate and recommend those as a good
standard configuration, it would be just great.
If they are ok,  wouldn't be a nice to have them applied from within UI
with the "Optimize for VirtStore"  button ?
Thnak you !


On Mon, Apr 15, 2019 at 7:39 PM Alex McWhirter <a...@triadic.us> wrote:

> On 2019-04-14 23:22, Leo David wrote:
>
> Hi,
> Thank you Alex, I was looking for some optimisation settings as well,
> since I am pretty much in the same boat, using ssd based
> replicate-distributed volumes across 12 hosts.
> Could anyone else (maybe even from from ovirt or rhev team) validate these
> settings or add some other tweaks as well, so we can use them as standard ?
> Thank you very much again !
>
> On Mon, Apr 15, 2019, 05:56 Alex McWhirter <a...@triadic.us> wrote:
>
>> On 2019-04-14 20:27, Jim Kusznir wrote:
>>
>> Hi all:
>> I've had I/O performance problems pretty much since the beginning of
>> using oVirt.  I've applied several upgrades as time went on, but strangely,
>> none of them have alleviated the problem.  VM disk I/O is still very slow
>> to the point that running VMs is often painful; it notably affects nearly
>> all my VMs, and makes me leary of starting any more.  I'm currently running
>> 12 VMs and the hosted engine on the stack.
>> My configuration started out with 1Gbps networking and hyperconverged
>> gluster running on a single SSD on each node.  It worked, but I/O was
>> painfully slow.  I also started running out of space, so I added an SSHD on
>> each node, created another gluster volume, and moved VMs over to it.  I
>> also ran that on a dedicated 1Gbps network.  I had recurring disk failures
>> (seems that disks only lasted about 3-6 months; I warrantied all three at
>> least once, and some twice before giving up).  I suspect the Dell PERC 6/i
>> was partly to blame; the raid card refused to see/acknowledge the disk, but
>> plugging it into a normal PC showed no signs of problems.  In any case,
>> performance on that storage was notably bad, even though the gig-e
>> interface was rarely taxed.
>> I put in 10Gbps ethernet and moved all the storage on that none the less,
>> as several people here said that 1Gbps just wasn't fast enough.  Some
>> aspects improved a bit, but disk I/O is still slow.  And I was still having
>> problems with the SSHD data gluster volume eating disks, so I bought a
>> dedicated NAS server (supermicro 12 disk dedicated FreeNAS NFS storage
>> system on 10Gbps ethernet).  Set that up.  I found that it was actually
>> FASTER than the SSD-based gluster volume, but still slow.  Lately its been
>> getting slower, too...Don't know why.  The FreeNAS server reports network
>> loads around 4MB/s on its 10Gbe interface, so its not network constrained.
>> At 4MB/s, I'd sure hope the 12 spindle SAS interface wasn't constrained
>> either.....  (and disk I/O operations on the NAS itself complete much
>> faster).
>> So, running a test on my NAS against an ISO file I haven't accessed in
>> months:
>>  # dd
>> if=en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso
>> of=/dev/null bs=1024k count=500
>>
>> 500+0 records in
>> 500+0 records out
>> 524288000 bytes transferred in 2.459501 secs (213168465 bytes/sec)
>> Running it on one of my hosts:
>> root@unifi:/home/kusznir# time dd if=/dev/sda of=/dev/null bs=1024k
>> count=500
>> 500+0 records in
>> 500+0 records out
>> 524288000 bytes (524 MB, 500 MiB) copied, 7.21337 s, 72.7 MB/s
>> (I don't know if this is a true apples to apples comparison, as I don't
>> have a large file inside this VM's image).  Even this is faster than I
>> often see.
>> I have a VoIP Phone server running as a VM.  Voicemail and other
>> recordings usually fail due to IO issues opening and writing the files.
>> Often, the first 4 or so seconds of the recording is missed; sometimes the
>> entire thing just fails.  I didn't use to have this problem, but its
>> definately been getting worse.  I finally bit the bullet and ordered a
>> physical server dedicated for my VoIP System...But I still want to figure
>> out why I'm having all these IO problems.  I read on the list of people
>> running 30+ VMs...I feel that my IO can't take any more VMs with any
>> semblance of reliability.  We have a Quickbooks server on here too
>> (windows), and the performance is abysmal; my CPA is charging me extra
>> because of all the lost staff time waiting on the system to respond and
>> generate reports.....
>> I'm at my whits end...I started with gluster on SSD with 1Gbps network,
>> migrated to 10Gbps network, and now to dedicated high performance NAS box
>> over NFS, and still have performance issues.....I don't know how to
>> troubleshoot the issue any further, but I've never had these kinds of
>> issues when I was playing with other VM technologies.  I'd like to get to
>> the point where I can resell virtual servers to customers, but I can't do
>> so with my current performance levels.
>> I'd greatly appreciate help troubleshooting this further.
>> --Jim
>>
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZR64VABNT2SGKLNP3XNTHCGFZXSOJAQF/
>>
>> Been working on optimizing the same. This is where im at currently.
>>
>> Gluster volume settings.
>>
>> diagnostics.count-fop-hits: on
>> diagnostics.latency-measurement: on
>> performance.write-behind-window-size: 64MB
>> performance.flush-behind: on
>> performance.stat-prefetch: on
>> server.event-threads: 4
>> client.event-threads: 8
>> performance.io-thread-count: 32
>> network.ping-timeout: 30
>> cluster.granular-entry-heal: enable
>> performance.strict-o-direct: on
>> storage.owner-gid: 36
>> storage.owner-uid: 36
>> features.shard: on
>> cluster.shd-wait-qlength: 10000
>> cluster.shd-max-threads: 8
>> cluster.locking-scheme: granular
>> cluster.data-self-heal-algorithm: full
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.eager-lock: enable
>> network.remote-dio: off
>> performance.low-prio-threads: 32
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> auth.allow: *
>> user.cifs: off
>> transport.address-family: inet
>> nfs.disable: off
>> performance.client-io-threads: on
>>
>> sysctl options
>>
>> net.core.rmem_max = 134217728
>> net.core.wmem_max = 134217728
>> net.ipv4.tcp_rmem = 4096 87380 134217728
>> net.ipv4.tcp_wmem = 4096 65536 134217728
>> net.core.netdev_max_backlog = 300000
>> net.ipv4.tcp_moderate_rcvbuf =1
>> net.ipv4.tcp_no_metrics_save = 1
>> net.ipv4.tcp_congestion_control=htcp
>>
>> custom /sbin/ifup-local file, Storage is the bridge name, which ==
>> ens3f0/1 in bond2
>>
>> #!/bin/bash
>> case "$1" in
>>   Storage)
>>     /sbin/ethtool -K ens3f0 tx off rx off tso off gso off
>>     /sbin/ethtool -K ens3f1 tx off rx off tso off gso off
>>     /sbin/ip link set dev ens3f0 txqueuelen 10000
>>     /sbin/ip link set dev ens3f1 txqueuelen 10000
>>     /sbin/ip link set dev bond2 txqueuelen 10000
>>     /sbin/ip link set dev Storage txqueuelen 10000
>>   ;;
>>   *)
>>   ;;
>> esac
>> exit 0
>>
>> i still have some latency issues, but my writes are up to 264MB/S
>> sequential on HDD's
>>
>> output of crystal diskmark on windows 10 vm
>>
>>    Sequential Read (Q= 32,T= 1) :   688.536 MB/s
>>   Sequential Write (Q= 32,T= 1) :   264.254 MB/s
>>   Random Read 4KiB (Q=  8,T= 8) :   176.069 MB/s [  42985.6 IOPS]
>>  Random Write 4KiB (Q=  8,T= 8) :    63.217 MB/s [  15433.8 IOPS]
>>   Random Read 4KiB (Q= 32,T= 1) :   159.598 MB/s [  38964.4 IOPS]
>>  Random Write 4KiB (Q= 32,T= 1) :    54.212 MB/s [  13235.4 IOPS]
>>   Random Read 4KiB (Q=  1,T= 1) :     3.488 MB/s [    851.6 IOPS]
>>  Random Write 4KiB (Q=  1,T= 1) :     3.006 MB/s [    733.9 IOPS]
>>
>> also enabling libgfapi on the engine was the best performance option i
>> ever tweaked, easily doubled reads / writes
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/S7I3PQVERQZT6Q6CXDWJEWCY2ELEGRHY/
>
>
>
> Also with all of that said, i've mostly solved the rest of my issues by
> enabling performance.read-ahead on the gluster volume. I am saturating my
> 10G network, which translates to 700MB/s reads, 350MB/s writes (replica 2)
>
> just make sure your local read ahead settings on the bricks are sane, I.E
> "blockdev --getra /dev/sdx", mine is 8192
>
>


-- 
Best regards, Leo David

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UXNMJO5Z7WXDCEESECTD5VNDZK5BSZX6/

[ovirt-users] Re: Poor I/O Performance (again...)

Reply via email to