Hi
Should I use threaddump_linux.sh.tar.gz ? from: https://access.redhat.com/solutions/18178 > On 9 Aug 2021, at 17:56, Artur Socha <aso...@redhat.com> wrote: > > Actually you could even make 3 thread dumps in 30second intervals. > Artur > > On Mon, Aug 9, 2021 at 4:53 PM Artur Socha <aso...@redhat.com> wrote: > Unfortunately I don't see anything wrong in both engine and vdsm logs. > There is one last thing that comes to my mind that you try - restart engine > service. That is exactly the case I have been investigating. > But before restarting I would like to ask you, if possible, for a java (jvm) > thread dump. > The procedure is as follows: > 1) find jboss pid ie. > $ ps -ef | grep jboss | grep -v grep | awk '{ print $2 }' > 2) trigger thread dump > $ kill -3 <jboss-pid> > 3) thread dump logs can be found at /var/log/ovirt-engine/console.log > > And then restart engine service to check if that helps. > > Artur > > > On Mon, Aug 9, 2021 at 2:19 PM Andrei Verovski <andre...@starlett.lv> wrote: > Hi, Artur, > > Small update with vdsm status, forgot to include in previous post. > > I partially fixed problem with VDSM start. > > Bug "Failed to create session: Start job for unit user-0.slice failed with > ‘canceled’” > is being described here > https://bugzilla.redhat.com/show_bug.cgi?id=1967962 > and fix seem to be available here, so I have downgraded systemd with backport > fix: > http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/ > > Now vdsmd service starts successfully, but node14 still cannot be activated > because of same error. This is quite strange, before restart on Friday node > just worked. There were no upgrades, nothing, just restart. > > [root@node14 ~]# service vdsmd status > Redirecting to /bin/systemctl status vdsmd.service > ● vdsmd.service - Virtual Desktop Server Manager > Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor > preset: disabled) > Active: active (running) since Mon 2021-08-09 15:12:59 EEST; 4min 20s ago > Process: 4066 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh > --pre-start (code=exited, status=0/SUCCESS) > Main PID: 4130 (vdsmd) > Tasks: 41 (limit: 615525) > Memory: 59.5M > CGroup: /system.slice/vdsmd.service > └─4130 /usr/bin/python3 /usr/share/vdsm/vdsmd > > Aug 09 15:12:55 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > prepare_transient_repository > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > syslog_available > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > nwfilter > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > dummybr > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > tune_system > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > test_space > Aug 09 15:12:59 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running > test_lo > Aug 09 15:12:59 node14.***.lv systemd[1]: Started Virtual Desktop Server > Manager. > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available. Error: > [Errno 111] Connection refused > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available, KSM stats > will be missing. Error: > > > [root@node14]# firewall-cmd --list-all > public (active) > target: default > icmp-block-inversion: no > interfaces: DMZ_node14 eno1 eno2 ovirtmgmt > sources: > services: cockpit dhcpv6-client libvirt-tls mountd nfs ovirt-imageio > ovirt-vmconsole rpc-bind snmp ssh vdsm > ports: 2301/tcp 2381/tcp 22/tcp 6081/udp > protocols: > forward: no > masquerade: no > forward-ports: > source-ports: > icmp-blocks: > rich rules: > [root@node14 andrei]# > > > vdsm-client Host getStats and vdsm-client Host getCapabilities attached. > > > > >> On 9 Aug 2021, at 13:18, Artur Socha <aso...@redhat.com> wrote: >> >> Thanks for the logs. I am checking them at the moment. I have noticed so >> far that node14 is serving NFS share which had been marked as problematic >> (probably because of the downtime during the migration) but it has >> recovered. >> >> In the meantime, is is possible to get some meaningful results when calling: >> $ vdsm-client Host getStats >> and >> $ vdsm-client Host getCapabilities >> on node14? >> >> What is the state for vdsmd service when running systemctl status vdsmd? >> One other thing to rule out is the networking/firewall. Here the list of the >> ports to be open for the host (the documentation is for hosted engine, but >> it applies for standalone setup as well): >> https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#host-firewall-requirements_SHE_cli_deploy >> >> btw. I have been hunting for the rare and hard to recreate bug for quite a >> long time (without success yet) so any reported connectivity issues between >> the manager and hosts are super interesting to me. >> >> Artur >> >> On Mon, Aug 9, 2021 at 11:44 AM Andrei Verovski <andreil1@***.lv> wrote: >> Hi, Artur, >> >> >> Thanks for assistance. Zipped engine starting from the day of upgrade >> attached. >> Restart via SSH from oVirt Web GUI works. >> oVirt engine runs on dedicated server, not hosted engine. >> >> >> >> >>> On 9 Aug 2021, at 11:24, Artur Socha <aso...@redhat.com> wrote: >>> >>> Hi Andrei, >>> Could you also post a relevant piece of engine.log? I don't have high >>> expectations to find the answer there but I just want to be sure of it. >>> VDSM.log does not show any trace of error from the vdsm point of view. For >>> example it looks like it started correctly and subscribed to receiving >>> commands from the engine (yet that does not mean I connected to it - only >>> in listening mode). >>> >>> Can you confirm that 'SSH restart' from UI works - by 'works' I mean the >>> host is actually restarted after a few minutes and there are no ssh related >>> (public key etc) errors in engine.log? >>> >>> Artur >>> >>> On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andreil1@***.lv> wrote: >>> Hi, >>> >>> I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with >>> CentOS 8 stream). >>> After replacing server rack router switch and restart got this error I >>> can’t recover from: >>> >>> VDSM node14 command Get Host Capabilities failed: Message timeout which can >>> be caused by communication issues >>> >>> vdsm-network running fine, but vdsmd can’t start on node14 for whatever >>> reason. All other nodes running fine. >>> >>> Aug 09 10:24:12 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: >>> Running dummybr >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: >>> Running tune_system >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: >>> Running test_space >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: >>> Running test_lo >>> Aug 09 10:24:13 node14.mydomain.lv systemd[1]: Started Virtual Desktop >>> Server Manager. >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_systemd(sudo:session): >>> Failed to create session: Start job for unit user-0.slice failed with >>> 'canceled' >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session): >>> session opened for user root by (uid=0) >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session): >>> session closed for user root >>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available. >>> Error: [Errno 2] No such file or directory >>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available, KSM >>> stats will be missing. Error: >>> >>> >>> In web gui -> Management I can’t do anything with the host except restart. >>> Stop aborts with error, all other commands are gray-ed out. >>> Status is “Unassigned”. Host is answering to pings as usual. >>> vdsm.log (from node14) attached. >>> >>> Thanks in advance for any help. >>> >>> >>> _______________________________________________ >>> Users mailing list -- users@ovirt.org >>> To unsubscribe send an email to users-le...@ovirt.org >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/55M65W57Z43ZVPOARDTK7HKHCAMAUGO5/ >>> >>> >>> -- >>> Artur Socha >>> Senior Software Engineer, RHV >>> Red Hat >> >> >> >> -- >> Artur Socha >> Senior Software Engineer, RHV >> Red Hat > > > > -- > Artur Socha > Senior Software Engineer, RHV > Red Hat > > > -- > Artur Socha > Senior Software Engineer, RHV > Red Hat _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/U5VAABEJQDFHNXBPUUZSJREYBMRWAEY4/