Hello folks,

I am running a 3-way, no arbiter Gluster setup using oVirt and contained Gluster 6.7. After a crash we are unable to start any VMs due to Storage IO error. After much, much backtracking and debugging we are closing in on the symptons, albeit not the issue.


 - gluster volume is healthy,
 - No outstanding heal or split-brain files,
 - 3 way without arbiter nodes (3 copies),
 - I already ran several "heal full" commands.

 Gluster Volume Info
    Volume Name: ssd_storage
    Type: Replicate
    Volume ID: d84ec99a-5db9-49c6-aab4-c7481a1dc57b
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 1 x 3 = 3
    Transport-type: tcp
    Brick1: node01.company.com:/gluster_bricks/ssd_storage/ssd_storage
    Brick2: node02.company.com:/gluster_bricks/ssd_storage/ssd_storage
    Brick3: node03.company.com:/gluster_bricks/ssd_storage/ssd_storage
    Options Reconfigured:
    cluster.self-heal-daemon: enable
    cluster.granular-entry-heal: enable
    storage.owner-gid: 36
    storage.owner-uid: 36
    network.ping-timeout: 30
    server.event-threads: 4
    client.event-threads: 4
    cluster.choose-local: off
    user.cifs: off
    features.shard: on
    cluster.shd-wait-qlength: 10000
    cluster.shd-max-threads: 8
    cluster.locking-scheme: granular
    cluster.data-self-heal-algorithm: full
    cluster.server-quorum-type: server
    cluster.quorum-type: auto
    cluster.eager-lock: enable
    network.remote-dio: off
    performance.low-prio-threads: 32
    performance.io-cache: off
    performance.read-ahead: off
    performance.quick-read: off
    performance.strict-o-direct: on
    transport.address-family: inet
    nfs.disable: on
    performance.client-io-threads: on

Gluster Volume Status
    Status of volume: ssd_storage
Gluster process TCP Port RDMA Port Online Pid

    Brick node01.company.com:/gluster_br
icks/ssd_storage/ssd_storage 49152 0 Y 8218
    Brick node02.company.com:/gluster_br
icks/ssd_storage/ssd_storage 49152 0 Y 23595
    Brick node03.company.com:/gluster_br
icks/ssd_storage/ssd_storage 49152 0 Y 8080 Self-heal Daemon on localhost N/A N/A Y 66028 Self-heal Daemon on N/A N/A Y 52087
    Self-heal Daemon on node03.company.com
et N/A N/A Y 8372

    Task Status of Volume ssd_storage

    There are no active volume tasks

The mounted path where the oVirt vm files reside is 100% okay, we copied all the images out there onto standalone hosts and the images run just fine. There is no obvious data corruption. However launching any VM out of oVirt fails with "IO Storage Error".

This is where everything gets funny.
oVirt uses a vdsm user to access all the files.

- root can read, edit and write all files inside the ovirt mounted gluster path. - vdsm user can write to new files regardless of size without any issues; changes get replicated instantly to other nodes. - vdsm user can append to existing files regardless of size without any issues; changes get replicated instantly to other nodes.
 - vdsm user can read files if those files are smaller than 64mb.
- vdsm user gets permission denied errors if the file to be read is 65mb or bigger. - vdsm user gets permission denied errors if the requests crosses a gluster shard-file boundary. - if root does a "dd if=file_larger_than64mb" of=/dev/null" on any large file, the file can then be read by the vdsm user on that single node. Changes do not get replicated to other nodes.


 id of the vdsm user & sudo to them:

[vdsm@node01:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] $ id uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[vdsm@node02:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] $ id uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[vdsm@node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] $ id uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Create a file >64mb on one node:

[vdsm@node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] $ base64 /dev/urandom | head -c 200000000 > file.txt [vdsm@node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] $ ls -lha
total 191M
drwxr-xr-x. 2 vdsm kvm   30 Feb  4 13:10 .
drwxr-xr-x. 6 vdsm kvm   80 Jan  1  1970 ..
-rw-r--r--. 1 vdsm kvm 191M Feb  4 13:10 file.txt

File is instantly available on another node:

[vdsm@node01:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] $ ls -lha
total 191M
drwxr-xr-x. 2 vdsm kvm   30 Feb  4 13:10 .
drwxr-xr-x. 6 vdsm kvm   80 Jan  1  1970 ..
-rw-r--r--. 1 vdsm kvm 191M Feb  4 13:10 file.txt

Accessing the whole file fails:

[vdsm@node01:] dd if=file.txt of=/dev/null
dd: error reading ‘file.txt’: Permission denied
131072+0 records in
131072+0 records out
67108864 bytes (67 MB) copied, 0.0651919 s, 1.0 GB/s

Reading first 64mb works, 65mb (crossing boundary) does not:

[vdsm@node01:] $ dd if=file.txt bs=1M count=64  of=/dev/null
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.00801663 s, 8.4 GB/s

[vdsm@node01:] $ dd if=file.txt bs=1M count=65  of=/dev/null
dd: error reading ‘file.txt’: Permission denied
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.00908712 s, 7.4 GB/s

Attaching/ appending to the file works (not crossing bounary):

[vdsm@node01:] $ date >> file.txt
[vdsm@node01:] $

[vdsm@node02:] $ tail -n2 file.txt
E16ACZaLqLhx2oUUUov5JHvQcVFohn6HH+eog6XZCiTaG0Tue  4 Feb 13:18:37 CET 2020

Reading the file beginning & end works, if it crosses the boundary not so much:

[vdsm@node02:] $ head file.txt

[vdsm@node02:] $ tail file.txt
E16ACZaLqLhx2oUUUov5JHvQcVFohn6HH+eog6XZCiTaG0Tue  4 Feb 13:18:37 CET 2020

[vdsm@node02:] $ dd if=file.txt of=/dev/null
dd: error reading ‘file.txt’: Permission denied
131072+0 records in
131072+0 records out
67108864 bytes (67 MB) copied, 0.106097 s, 633 MB/s

if root does dd first, all is peachy:

[root@node02] # dd if=file.txt of=/dev/null
390625+1 records in
390625+1 records out
200000058 bytes (200 MB) copied, 0.345906 s, 578 MB/s

[vdsm@node02] $ dd if=file.txt of=/dev/null
390625+1 records in
390625+1 records out
200000058 bytes (200 MB) copied, 0.188451 s, 1.1 GB/s

Error in the gluster.log:

[2020-02-04 12:27:57.915356] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1: remote operation failed. Path: /.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2 (00000000-0000-0000-0000-000000000000) [Permission denied] [2020-02-04 12:27:57.915404] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-0: remote operation failed. Path: /.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2 (00000000-0000-0000-0000-000000000000) [Permission denied] [2020-02-04 12:27:57.915472] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-2: remote operation failed. Path: /.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2 (00000000-0000-0000-0000-000000000000) [Permission denied] [2020-02-04 12:27:57.915490] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-ssd_storage-shard: Lookup on shard 2 failed. Base file gfid = 57200f4f-537d-4e56-9258-38fe6ac64c4e [Permission denied]

What we tried:

 - restarting single hosts,
 - restarting the entire cluster,
 - doing stuff like find /rhev .. exec stats{}\ ;
 - dd'ing (read) all of the mount dir...

We are out of ideas and also no experts on either gluster nor ovirt, it seems. And this is supposed to be a production HA environment. Any help would be appreciated.
I hope I did think of all the relevant data and logs.

with kind regards,
mit freundlichen Gruessen,

Christian Reiss
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
List Archives: 

Reply via email to