On Tue, Jan 12, 2021, 2:04 AM Charles Lam <clam2...@gmail.com> wrote:
> Dear Strahil and Ritesh, > > Thank you both. I am back where I started with: > > "One or more bricks could be down. Please execute the command again after > bringing all bricks online and finishing any pending heals\nVolume heal > failed.", "stdout_lines": ["One or more bricks could be down. Please > execute the command again after bringing all bricks online and finishing > any pending heals", "Volume heal failed."] > > Regarding my most recent issue: > > "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: > FATAL: Module > kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n" > > Per Strahil's note, I checked for kvdo: > > [r...@host1.tld.com conf.d]# rpm -qa | grep vdo > libblockdev-vdo-2.24-1.el8.x86_64 > vdo-6.2.3.114-14.el8.x86_64 > kmod-kvdo-6.2.2.117-65.el8.x86_64 > [r...@host1.tld.com conf.d]# > > [r...@host2.tld.com conf.d]# rpm -qa | grep vdo > libblockdev-vdo-2.24-1.el8.x86_64 > vdo-6.2.3.114-14.el8.x86_64 > kmod-kvdo-6.2.2.117-65.el8.x86_64 > [r...@host2.tld.com conf.d]# > > [r...@host3.tld.com ~]# rpm -qa | grep vdo > libblockdev-vdo-2.24-1.el8.x86_64 > vdo-6.2.3.114-14.el8.x86_64 > kmod-kvdo-6.2.2.117-65.el8.x86_64 > [r...@host3.tld.com ~]# > > I found > https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-creating-vdo-kernel-module-kvdo-not-installed > which pointed to https://bugs.centos.org/view.php?id=17928. As suggested > on the CentOS bug tracker I attempted to manually install > > vdo-support-6.2.4.14-14.el8.x86_64 > vdo-6.2.4.14-14.el8.x86_64 > kmod-kvdo-6.2.3.91-73.el8.x86_64 > > but there was a dependency that kernel-core be greater than what I was > installed, so I manually upgraded kernel-core to > kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to > > vdo-6.2.4.14-14.el8.x86_64.rpm > kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm > > and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and > redeploy I am now back at Gluster deploy failing at > > TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] > ********** > task path: > /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 > failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': > '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": > "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", > "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end": > "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": > "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero > return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": > "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please > execute the command again after bringing all bricks online and finishing > any pending heals\nVolume heal failed.", "stdout_lines": ["One or more > bricks could be down. Please execute the command again after bringing all > bricks online and finishing any pending heals", "Volume heal failed."]} > failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': > '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", > "changed": true, "cmd": ["gluster", "volume", "heal", "data", > "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": > "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": > "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return > code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", > "stderr_lines": [], "stdout": "One or more bricks could be down. Please > execute the command again after bringing all bricks online and finishing > any pending heals\nVolume heal failed.", "stdout_lines": ["One or more > bricks could be down. Please execute the command again after bringing all > bricks online and finishing any pending heals", "Volume heal failed."]} > failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': > '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": > "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", > "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end": > "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": > "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero > return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": > "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please > execute the command again after bringing all bricks online and finishing > any pending heals\nVolume heal failed.", "stdout_lines": ["One or more > bricks could be down. Please execute the command again after bringing all > bricks online and finishing any pending heals", "Volume heal failed."]} > > NO MORE HOSTS LEFT > ************************************************************* > > NO MORE HOSTS LEFT > ************************************************************* > > PLAY RECAP > ********************************************************************* > fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 > failed=1 skipped=188 rescued=0 ignored=1 > fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 > failed=0 skipped=163 rescued=0 ignored=1 > fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 > failed=0 skipped=163 rescued=0 ignored=1 > > Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for > more informations. > > I doubled-back to Strahil's recommendation to restart Gluster and enable > granular-entry-heal. This fails, example: > > [root@host1 ~]# gluster volume heal data granular-entry-heal enable > One or more bricks could be down. Please execute the command again after > bringing all bricks online and finishing any pending heals > Volume heal failed. > > I have followed Ritesh's suggestion: > > [root@host1~]# ansible-playbook > /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml > -i /etc/ansible/hc_wizard_inventory.yml > > which appeared to execute successfully: > > PLAY RECAP > ********************************************************************************************************** > fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 > failed=0 skipped=2 rescued=0 ignored=0 > fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 > failed=0 skipped=1 rescued=0 ignored=0 > fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 > failed=0 skipped=1 rescued=0 ignored=0 > So after this have you tried gluster deployment..? > > Here is the info Strahil requested when I first reported this issue on > December 18th, re-run today, January 11: > > [root@host1 ~]# gluster pool list > UUID Hostname State > 4964020a-9632-43eb-9468-798920e98559 host2.domain.com Connected > f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b host3.domain.com Connected > 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected > [root@host1 ~]# gluster volume list > data > engine > vmstore > [root@host1 ~]# for i in $(gluster volume list); do gluster volume status > $i; gluster volume info $i; echo > "###########################################################################################################";done > Status of volume: data > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick host1.domain.com:/gluster_bricks > /data/data 49153 0 Y > 406272 > Brick host2.domain.com:/gluster_bricks > /data/data 49153 0 Y > 360300 > Brick host3.domain.com:/gluster_bricks > /data/data 49153 0 Y > 360082 > Self-heal Daemon on localhost N/A N/A Y > 413227 > Self-heal Daemon on host2.domain.com N/A N/A Y 360223 > Self-heal Daemon on host3.domain.com N/A N/A Y 360003 > > Task Status of Volume data > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > Volume Name: data > Type: Replicate > Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: host1.domain.com:/gluster_bricks/data/data > Brick2: host2.domain.com:/gluster_bricks/data/data > Brick3: host3.domain.com:/gluster_bricks/data/data > Options Reconfigured: > performance.client-io-threads: on > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: off > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > cluster.choose-local: off > client.event-threads: 4 > server.event-threads: 4 > storage.owner-uid: 36 > storage.owner-gid: 36 > network.ping-timeout: 30 > performance.strict-o-direct: on > > ########################################################################################################### > Status of volume: engine > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick host1.domain.com:/gluster_bricks > /engine/engine 49152 0 Y > 404563 > Brick host2.domain.com:/gluster_bricks > /engine/engine 49152 0 Y > 360202 > Brick host3.domain.com:/gluster_bricks > /engine/engine 49152 0 Y > 359982 > Self-heal Daemon on localhost N/A N/A Y > 413227 > Self-heal Daemon on host3.domain.com N/A N/A Y 360003 > Self-heal Daemon on host2.domain.com N/A N/A Y 360223 > > Task Status of Volume engine > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > Volume Name: engine > Type: Replicate > Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: host1.domain.com:/gluster_bricks/engine/engine > Brick2: host2.domain.com:/gluster_bricks/engine/engine > Brick3: host3.domain.com:/gluster_bricks/engine/engine > Options Reconfigured: > performance.client-io-threads: on > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: off > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > cluster.choose-local: off > client.event-threads: 4 > server.event-threads: 4 > storage.owner-uid: 36 > storage.owner-gid: 36 > network.ping-timeout: 30 > performance.strict-o-direct: on > > ########################################################################################################### > Status of volume: vmstore > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick host1.domain.com:/gluster_bricks > /vmstore/vmstore 49154 0 Y > 407952 > Brick host2.domain.com:/gluster_bricks > /vmstore/vmstore 49154 0 Y > 360389 > Brick host3.domain.com:/gluster_bricks > /vmstore/vmstore 49154 0 Y > 360176 > Self-heal Daemon on localhost N/A N/A Y > 413227 > Self-heal Daemon on host2.domain.com N/A N/A Y 360223 > Self-heal Daemon on host3.domain.com N/A N/A Y 360003 > > Task Status of Volume vmstore > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > Volume Name: vmstore > Type: Replicate > Volume ID: 27c8346c-0374-4108-a33a-0024007a9527 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore > Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore > Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore > Options Reconfigured: > performance.client-io-threads: on > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: off > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > cluster.choose-local: off > client.event-threads: 4 > server.event-threads: 4 > storage.owner-uid: 36 > storage.owner-gid: 36 > network.ping-timeout: 30 > performance.strict-o-direct: on > > ########################################################################################################### > [root@host1 ~]# > > Again, further suggestions for troubleshooting are VERY much appreciated! > > Respectfully, > Charles > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/A2NR63KWDQSXFS2CRWGRF4HNIR4YDX6K/ >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BUISYMDHEFNM5KSVTQKCY5QIO4ZS5N7J/