You can use that one or this 'simplified' short version https://access.redhat.com/solutions/3227681
Artur On Mon, Aug 9, 2021 at 5:01 PM Andrei Verovski <andre...@starlett.lv> wrote: > Hi > > > Should I use threaddump_linux.sh.tar.gz ? > from: > > https://access.redhat.com/solutions/18178 > > > > On 9 Aug 2021, at 17:56, Artur Socha <aso...@redhat.com> wrote: > > > > Actually you could even make 3 thread dumps in 30second intervals. > > Artur > > > > On Mon, Aug 9, 2021 at 4:53 PM Artur Socha <aso...@redhat.com> wrote: > > Unfortunately I don't see anything wrong in both engine and vdsm logs. > > There is one last thing that comes to my mind that you try - restart > engine service. That is exactly the case I have been investigating. > > But before restarting I would like to ask you, if possible, for a java > (jvm) thread dump. > > The procedure is as follows: > > 1) find jboss pid ie. > > $ ps -ef | grep jboss | grep -v grep | awk '{ print $2 }' > > 2) trigger thread dump > > $ kill -3 <jboss-pid> > > 3) thread dump logs can be found at /var/log/ovirt-engine/console.log > > > > And then restart engine service to check if that helps. > > > > Artur > > > > > > On Mon, Aug 9, 2021 at 2:19 PM Andrei Verovski <andre...@starlett.lv> > wrote: > > Hi, Artur, > > > > Small update with vdsm status, forgot to include in previous post. > > > > I partially fixed problem with VDSM start. > > > > Bug "Failed to create session: Start job for unit user-0.slice failed > with ‘canceled’” > > is being described here > > https://bugzilla.redhat.com/show_bug.cgi?id=1967962 > > and fix seem to be available here, so I have downgraded systemd with > backport fix: > > > http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/ > > > > Now vdsmd service starts successfully, but node14 still cannot be > activated because of same error. This is quite strange, before restart on > Friday node just worked. There were no upgrades, nothing, just restart. > > > > [root@node14 ~]# service vdsmd status > > Redirecting to /bin/systemctl status vdsmd.service > > ● vdsmd.service - Virtual Desktop Server Manager > > Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; > vendor preset: disabled) > > Active: active (running) since Mon 2021-08-09 15:12:59 EEST; 4min 20s > ago > > Process: 4066 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh > --pre-start (code=exited, status=0/SUCCESS) > > Main PID: 4130 (vdsmd) > > Tasks: 41 (limit: 615525) > > Memory: 59.5M > > CGroup: /system.slice/vdsmd.service > > └─4130 /usr/bin/python3 /usr/share/vdsm/vdsmd > > > > Aug 09 15:12:55 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > prepare_transient_repository > > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > syslog_available > > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > nwfilter > > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > dummybr > > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > tune_system > > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > test_space > > Aug 09 15:12:59 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > test_lo > > Aug 09 15:12:59 node14.***.lv systemd[1]: Started Virtual Desktop Server > Manager. > > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available. Error: > [Errno 111] Connection refused > > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available, KSM > stats will be missing. Error: > > > > > > [root@node14]# firewall-cmd --list-all > > public (active) > > target: default > > icmp-block-inversion: no > > interfaces: DMZ_node14 eno1 eno2 ovirtmgmt > > sources: > > services: cockpit dhcpv6-client libvirt-tls mountd nfs ovirt-imageio > ovirt-vmconsole rpc-bind snmp ssh vdsm > > ports: 2301/tcp 2381/tcp 22/tcp 6081/udp > > protocols: > > forward: no > > masquerade: no > > forward-ports: > > source-ports: > > icmp-blocks: > > rich rules: > > [root@node14 andrei]# > > > > > > vdsm-client Host getStats and vdsm-client Host getCapabilities attached. > > > > > > > > > >> On 9 Aug 2021, at 13:18, Artur Socha <aso...@redhat.com> wrote: > >> > >> Thanks for the logs. I am checking them at the moment. I have noticed > so far that node14 is serving NFS share which had been marked as > problematic (probably because of the downtime during the migration) but it > has recovered. > >> > >> In the meantime, is is possible to get some meaningful results when > calling: > >> $ vdsm-client Host getStats > >> and > >> $ vdsm-client Host getCapabilities > >> on node14? > >> > >> What is the state for vdsmd service when running systemctl status > vdsmd? One other thing to rule out is the networking/firewall. Here the > list of the ports to be open for the host (the documentation is for hosted > engine, but it applies for standalone setup as well): > >> > https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#host-firewall-requirements_SHE_cli_deploy > >> > >> btw. I have been hunting for the rare and hard to recreate bug for > quite a long time (without success yet) so any reported connectivity issues > between the manager and hosts are super interesting to me. > >> > >> Artur > >> > >> On Mon, Aug 9, 2021 at 11:44 AM Andrei Verovski <andreil1@***.lv> > wrote: > >> Hi, Artur, > >> > >> > >> Thanks for assistance. Zipped engine starting from the day of upgrade > attached. > >> Restart via SSH from oVirt Web GUI works. > >> oVirt engine runs on dedicated server, not hosted engine. > >> > >> > >> > >> > >>> On 9 Aug 2021, at 11:24, Artur Socha <aso...@redhat.com> wrote: > >>> > >>> Hi Andrei, > >>> Could you also post a relevant piece of engine.log? I don't have high > expectations to find the answer there but I just want to be sure of it. > >>> VDSM.log does not show any trace of error from the vdsm point of view. > For example it looks like it started correctly and subscribed to receiving > commands from the engine (yet that does not mean I connected to it - only > in listening mode). > >>> > >>> Can you confirm that 'SSH restart' from UI works - by 'works' I mean > the host is actually restarted after a few minutes and there are no ssh > related (public key etc) errors in engine.log? > >>> > >>> Artur > >>> > >>> On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andreil1@***.lv> > wrote: > >>> Hi, > >>> > >>> I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with > CentOS 8 stream). > >>> After replacing server rack router switch and restart got this error I > can’t recover from: > >>> > >>> VDSM node14 command Get Host Capabilities failed: Message timeout > which can be caused by communication issues > >>> > >>> vdsm-network running fine, but vdsmd can’t start on node14 for > whatever reason. All other nodes running fine. > >>> > >>> Aug 09 10:24:12 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: > Running dummybr > >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: > Running tune_system > >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: > Running test_space > >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: > Running test_lo > >>> Aug 09 10:24:13 node14.mydomain.lv systemd[1]: Started Virtual > Desktop Server Manager. > >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: > pam_systemd(sudo:session): Failed to create session: Start job for unit > user-0.slice failed with 'canceled' > >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: > pam_unix(sudo:session): session opened for user root by (uid=0) > >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: > pam_unix(sudo:session): session closed for user root > >>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not > available. Error: [Errno 2] No such file or directory > >>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not > available, KSM stats will be missing. Error: > >>> > >>> > >>> In web gui -> Management I can’t do anything with the host except > restart. Stop aborts with error, all other commands are gray-ed out. > >>> Status is “Unassigned”. Host is answering to pings as usual. > >>> vdsm.log (from node14) attached. > >>> > >>> Thanks in advance for any help. > >>> > >>> > >>> _______________________________________________ > >>> Users mailing list -- users@ovirt.org > >>> To unsubscribe send an email to users-le...@ovirt.org > >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>> oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > >>> List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/55M65W57Z43ZVPOARDTK7HKHCAMAUGO5/ > >>> > >>> > >>> -- > >>> Artur Socha > >>> Senior Software Engineer, RHV > >>> Red Hat > >> > >> > >> > >> -- > >> Artur Socha > >> Senior Software Engineer, RHV > >> Red Hat > > > > > > > > -- > > Artur Socha > > Senior Software Engineer, RHV > > Red Hat > > > > > > -- > > Artur Socha > > Senior Software Engineer, RHV > > Red Hat > > -- Artur Socha Senior Software Engineer, RHV Red Hat
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2KJQ3YWYBHUVARXAW2S7L6WZG7PWJ5OZ/