You can use that one or this 'simplified' short version
https://access.redhat.com/solutions/3227681

Artur

On Mon, Aug 9, 2021 at 5:01 PM Andrei Verovski <andre...@starlett.lv> wrote:

> Hi
>
>
> Should  I use threaddump_linux.sh.tar.gz ?
> from:
>
> https://access.redhat.com/solutions/18178
>
>
> > On 9 Aug 2021, at 17:56, Artur Socha <aso...@redhat.com> wrote:
> >
> > Actually you could even make 3 thread dumps in 30second intervals.
> > Artur
> >
> > On Mon, Aug 9, 2021 at 4:53 PM Artur Socha <aso...@redhat.com> wrote:
> > Unfortunately I don't see anything wrong in both engine and vdsm logs.
> > There is one last thing that comes to my mind that you try - restart
> engine service. That is exactly the case I have been investigating.
> > But before restarting I would like to ask you, if possible, for a java
> (jvm) thread dump.
> > The procedure is as follows:
> > 1)  find jboss pid  ie.
> > $ ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
> > 2) trigger thread dump
> > $ kill -3 <jboss-pid>
> > 3)  thread dump logs can be found at /var/log/ovirt-engine/console.log
> >
> > And then restart engine service to check if that helps.
> >
> > Artur
> >
> >
> > On Mon, Aug 9, 2021 at 2:19 PM Andrei Verovski <andre...@starlett.lv>
> wrote:
> > Hi, Artur,
> >
> > Small update with vdsm status, forgot to include in previous post.
> >
> > I partially fixed problem with VDSM start.
> >
> > Bug "Failed to create session: Start job for unit user-0.slice failed
> with ‘canceled’”
> > is being described here
> > https://bugzilla.redhat.com/show_bug.cgi?id=1967962
> > and fix seem to be available here, so I have downgraded systemd with
> backport fix:
> >
> http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/
> >
> > Now vdsmd service starts successfully, but node14 still cannot be
> activated because of same error. This is quite strange, before restart on
> Friday node just worked. There were no upgrades, nothing, just restart.
> >
> > [root@node14 ~]# service vdsmd status
> > Redirecting to /bin/systemctl status vdsmd.service
> > ● vdsmd.service - Virtual Desktop Server Manager
> >    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled;
> vendor preset: disabled)
> >    Active: active (running) since Mon 2021-08-09 15:12:59 EEST; 4min 20s
> ago
> >   Process: 4066 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
> --pre-start (code=exited, status=0/SUCCESS)
> >  Main PID: 4130 (vdsmd)
> >     Tasks: 41 (limit: 615525)
> >    Memory: 59.5M
> >    CGroup: /system.slice/vdsmd.service
> >            └─4130 /usr/bin/python3 /usr/share/vdsm/vdsmd
> >
> > Aug 09 15:12:55 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
> prepare_transient_repository
> > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
> syslog_available
> > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
> nwfilter
> > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
> dummybr
> > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
> tune_system
> > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
> test_space
> > Aug 09 15:12:59 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
> test_lo
> > Aug 09 15:12:59 node14.***.lv systemd[1]: Started Virtual Desktop Server
> Manager.
> > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available. Error:
> [Errno 111] Connection refused
> > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available, KSM
> stats will be missing. Error:
> >
> >
> > [root@node14]# firewall-cmd --list-all
> > public (active)
> >   target: default
> >   icmp-block-inversion: no
> >   interfaces: DMZ_node14 eno1 eno2 ovirtmgmt
> >   sources:
> >   services: cockpit dhcpv6-client libvirt-tls mountd nfs ovirt-imageio
> ovirt-vmconsole rpc-bind snmp ssh vdsm
> >   ports: 2301/tcp 2381/tcp 22/tcp 6081/udp
> >   protocols:
> >   forward: no
> >   masquerade: no
> >   forward-ports:
> >   source-ports:
> >   icmp-blocks:
> >   rich rules:
> > [root@node14 andrei]#
> >
> >
> > vdsm-client Host getStats and vdsm-client Host getCapabilities attached.
> >
> >
> >
> >
> >> On 9 Aug 2021, at 13:18, Artur Socha <aso...@redhat.com> wrote:
> >>
> >> Thanks for the logs.  I am checking them at the moment. I have noticed
> so far that node14 is serving NFS share which had been marked as
> problematic (probably because of the downtime during the migration) but it
> has recovered.
> >>
> >> In the meantime, is is possible to get some meaningful results when
> calling:
> >> $ vdsm-client Host getStats
> >> and
> >> $ vdsm-client Host getCapabilities
> >> on node14?
> >>
> >> What  is the state for vdsmd service when running systemctl status
> vdsmd? One other thing to rule out is the networking/firewall. Here the
> list of the ports to be open for the host (the documentation is for hosted
> engine, but it applies for standalone setup as well):
> >>
> https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#host-firewall-requirements_SHE_cli_deploy
> >>
> >> btw. I have been hunting for the rare and hard to recreate bug for
> quite a long time (without success yet) so any reported connectivity issues
> between the manager and hosts are super interesting to me.
> >>
> >> Artur
> >>
> >> On Mon, Aug 9, 2021 at 11:44 AM Andrei Verovski <andreil1@***.lv>
> wrote:
> >> Hi, Artur,
> >>
> >>
> >> Thanks for assistance. Zipped engine starting from the day of upgrade
> attached.
> >> Restart via SSH from oVirt Web GUI works.
> >> oVirt engine runs on dedicated server, not hosted engine.
> >>
> >>
> >>
> >>
> >>> On 9 Aug 2021, at 11:24, Artur Socha <aso...@redhat.com> wrote:
> >>>
> >>> Hi Andrei,
> >>> Could you also post a relevant piece of engine.log? I don't have high
> expectations to find the answer there but  I just want  to be sure of it.
> >>> VDSM.log does not show any trace of error from the vdsm point of view.
> For example it looks like it started correctly and subscribed to receiving
> commands from the engine (yet that does not mean I connected to it - only
> in listening mode).
> >>>
> >>> Can you confirm that 'SSH restart' from UI works - by 'works' I mean
> the host is actually restarted after a few minutes and there are no ssh
> related (public key etc) errors in engine.log?
> >>>
> >>> Artur
> >>>
> >>> On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andreil1@***.lv>
> wrote:
> >>> Hi,
> >>>
> >>> I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with
> CentOS 8 stream).
> >>> After replacing server rack router switch and restart got this error I
> can’t recover from:
> >>>
> >>> VDSM node14 command Get Host Capabilities failed: Message timeout
> which can be caused by communication issues
> >>>
> >>> vdsm-network running fine, but vdsmd can’t start on node14 for
> whatever reason. All other nodes running fine.
> >>>
> >>> Aug 09 10:24:12 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
> Running dummybr
> >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
> Running tune_system
> >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
> Running test_space
> >>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
> Running test_lo
> >>> Aug 09 10:24:13 node14.mydomain.lv systemd[1]: Started Virtual
> Desktop Server Manager.
> >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]:
> pam_systemd(sudo:session): Failed to create session: Start job for unit
> user-0.slice failed with 'canceled'
> >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]:
> pam_unix(sudo:session): session opened for user root by (uid=0)
> >>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]:
> pam_unix(sudo:session): session closed for user root
> >>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not
> available. Error: [Errno 2] No such file or directory
> >>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not
> available, KSM stats will be missing. Error:
> >>>
> >>>
> >>> In web gui -> Management I can’t do anything with the host except
> restart. Stop aborts with error, all other commands are gray-ed out.
> >>> Status is “Unassigned”. Host is answering to pings as usual.
> >>> vdsm.log (from node14) attached.
> >>>
> >>> Thanks in advance for any help.
> >>>
> >>>
> >>> _______________________________________________
> >>> Users mailing list -- users@ovirt.org
> >>> To unsubscribe send an email to users-le...@ovirt.org
> >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >>> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> >>> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/55M65W57Z43ZVPOARDTK7HKHCAMAUGO5/
> >>>
> >>>
> >>> --
> >>> Artur Socha
> >>> Senior Software Engineer, RHV
> >>> Red Hat
> >>
> >>
> >>
> >> --
> >> Artur Socha
> >> Senior Software Engineer, RHV
> >> Red Hat
> >
> >
> >
> > --
> > Artur Socha
> > Senior Software Engineer, RHV
> > Red Hat
> >
> >
> > --
> > Artur Socha
> > Senior Software Engineer, RHV
> > Red Hat
>
>

-- 
Artur Socha
Senior Software Engineer, RHV
Red Hat
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2KJQ3YWYBHUVARXAW2S7L6WZG7PWJ5OZ/

Reply via email to