Here's gluster volume info: [root@ovirt2 ~]# gluster volume info
Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on server.allow-insecure: on performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on Volume Name: data-hdd Type: Replicate Volume ID: d342a3ab-16f3-49f0-bbcf-f788be8ac5f1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.172.1.11:/gluster/brick3/data-hdd Brick2: 172.172.1.12:/gluster/brick3/data-hdd Brick3: 172.172.1.13:/gluster/brick3/data-hdd Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on transport.address-family: inet performance.readdir-ahead: on Volume Name: engine Type: Replicate Volume ID: 87ad86b9-d88b-457e-ba21-5d3173c612de Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick1/engine Brick2: ovirt2.nwfiber.com:/gluster/brick1/engine Brick3: ovirt3.nwfiber.com:/gluster/brick1/engine (arbiter) Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on Volume Name: iso Type: Replicate Volume ID: b1ba15f5-0f0f-4411-89d0-595179f02b92 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick4/iso Brick2: ovirt2.nwfiber.com:/gluster/brick4/iso Brick3: ovirt3.nwfiber.com:/gluster/brick4/iso (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on -------------- When I try and turn on profiling, I get: [root@ovirt2 ~]# gluster volume profile data-hdd start Another transaction is in progress for data-hdd. Please try again after sometime. I don't know what that other transaction is, but I am having some "odd behavior" this morning, like a vm disk move between data and data-hdd that stuck at 84% overnight. I've been asking on IRC how to "un-stick" this transfer, as the VM cannot be started, and I can't seem to do anything about it. --Jim On Mon, Mar 19, 2018 at 2:14 AM, Sahina Bose <sab...@redhat.com> wrote: > > > On Mon, Mar 19, 2018 at 7:39 AM, Jim Kusznir <j...@palousetech.com> wrote: > >> Hello: >> >> This past week, I created a new gluster store, as I was running out of >> disk space on my main, SSD-backed storage pool. I used 2TB Seagate >> FireCuda drives (hybrid SSD/spinning). Hardware is Dell R610's with >> integral PERC/6i cards. I placed one disk per machine, exported the disk >> as a single disk volume from the raid controller, formatted it XFS, mounted >> it, and dedicated it to a new replica 3 gluster volume. >> >> Since doing so, I've been having major performance problems. One of my >> windows VMs sits at 100% disk utilization nearly continously, and its >> painful to do anything on it. A Zabbix install on CentOS using mysql as >> the backing has 70%+ iowait nearly all the time, and I can't seem to get >> graphs loaded from the web console. Its also always spewing errors that >> ultimately come down to insufficient disk performance issues. >> >> All of this was working OK before the changes. There are two: >> >> Old storage was SSD backed, Replica 2 + arb, and running on the same GigE >> network as management and main VM network. >> >> New storage was created using the dedicated Gluster network (running on >> em4 on these servers, completely different subnet (174.x vs 192.x), and was >> created replica 3 (no arb), on the FireCuda disks (seem to be the fastest I >> could afford for non-SSD, as I needed a lot more storage). >> >> My attempts to watch so far have NOT shown maxed network interfaces >> (using bwm-ng on the command line); in fact, the gluster interface is >> usually below 20% utilized. >> >> I'm not sure how to meaningfully measure the performance of the disk >> itself; I'm not sure what else to look at. My cluster is not very usable >> currently, though. IOWait on my hosts appears to be below 0.5%, usually >> 0.0 to 0.1. Inside the VMs is a whole different story. >> >> My cluster is currently running ovirt 4.1. I'm interested in going to >> 4.2, but I think I need to fix this first. >> > > > Can you provide the info of the volume using "gluster volume info" and > also profile the volume while running the tests where you experience the > performance issue, and share results? > > For info on how to profile (server-side profiling) - > https://docs.gluster.org/en/latest/Administrator%20Guide/ > Performance%20Testing/ > > >> Thanks! >> --Jim >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users