Do you have the option to use 'Install' -> enroll certificate (or whatever is the entry in UI ) ? Best Regards,Strahil Nikolov On Sun, Feb 20, 2022 at 8:05, Joseph Gelinas<jos...@gelinas.cc> wrote: Both I guess. The host certificates expired on the 15th the console expires on the 23. Right now since the engine sees the hosts as unassigned I don't get the option to set hosts to maintenance mode and if I try to set Enable Global Maintenance I get the message: "Cannot edit VM Cluster. Operation can be performed only when Hoist status is Up."
> On Feb 19, 2022, at 14:55, Strahil Nikolov <hunter86...@yahoo.com> wrote: > > Is your issue with the host certificates or the engine ? > > You can try to set a node in maintenance (or at least try that) and then try > to reenroll the certificate from the UI. > > Best Regards, > Strahil Nikolov > > On Sat, Feb 19, 2022 at 9:48, Joseph Gelinas > <jos...@gelinas.cc> wrote: > I believe I ran `hosted-engine --deploy` on ovirt-1 to see if there was an > option to reenroll that way, but when it prompted and asked if it was really > what I wanted to do I ctrl-D or said no and it ran something anyways, so I > ctrl-C out of it and maybe that is what messed up vdsm on that node. Not sure > about ovirt-3, is there a way to fix that? > > > On Feb 18, 2022, at 17:21, Joseph Gelinas <jos...@gelinas.cc> wrote: > > > > Unfortunately ovirt-ha-broker & ovirt-ha-agent are just in continual > > restart loops on ovirt-1 & ovirt-3 (ovirt-engine is currently on ovirt-3). > > > > The output for broker.log: > > > > MainThread::ERROR::2022-02-18 > > 22:08:58,101::broker::72::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > > Trying to restart the broker > > MainThread::INFO::2022-02-18 > > 22:08:58,453::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > > ovirt-hosted-engine-ha broker 2.4.5 started > > MainThread::INFO::2022-02-18 > > 22:09:00,456::monitor::45::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Searching for submonitors in > > /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors > > MainThread::INFO::2022-02-18 > > 22:09:00,456::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Loaded submonitor mem-free > > MainThread::INFO::2022-02-18 > > 22:09:00,457::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Loaded submonitor engine-health > > MainThread::INFO::2022-02-18 > > 22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Loaded submonitor cpu-load-no-engine > > MainThread::INFO::2022-02-18 > > 22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Loaded submonitor mgmt-bridge > > MainThread::INFO::2022-02-18 > > 22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Loaded submonitor network > > MainThread::INFO::2022-02-18 > > 22:09:00,460::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Loaded submonitor storage-domain > > MainThread::INFO::2022-02-18 > > 22:09:00,460::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Loaded submonitor cpu-load > > MainThread::INFO::2022-02-18 > > 22:09:00,460::monitor::63::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > > Finished loading submonitors > > MainThread::WARNING::2022-02-18 > > 22:10:00,788::storage_broker::100::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > > Can't connect vdsm storage: Couldn't connect to VDSM within 60 seconds > > MainThread::ERROR::2022-02-18 > > 22:10:00,788::broker::69::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > > Failed initializing the broker: Couldn't connect to VDSM within 60 seconds > > MainThread::ERROR::2022-02-18 > > 22:10:00,789::broker::71::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > > Traceback (most recent call last): > > File > >"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", > >line 64, in run > > self._storage_broker_instance = self._get_storage_broker() > > File > >"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", > >line 143, in _get_storage_broker > > return storage_broker.StorageBroker() > > File > >"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", > > line 97, in __init__ > > self._backend.connect() > > File > >"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", > > line 370, in connect > > connection = util.connect_vdsm_json_rpc(logger=self._logger) > > File > >"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line > >472, in connect_vdsm_json_rpc > > __vdsm_json_rpc_connect(logger, timeout) > > File > >"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line > >415, in __vdsm_json_rpc_connect > > timeout=VDSM_MAX_RETRY * VDSM_DELAY > > RuntimeError: Couldn't connect to VDSM within 60 seconds > > > > > > vdsm.log: > > > > 2022-02-18 22:14:43,939+0000 INFO (vmrecovery) [vds] recovery: waiting for > > storage pool to go up (clientIF:726) > > 2022-02-18 22:14:44,071+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48832 > > (protocoldetector:61) > > 2022-02-18 22:14:44,074+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:44,442+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48836 > > (protocoldetector:61) > > 2022-02-18 22:14:44,445+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:45,077+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48838 > > (protocoldetector:61) > > 2022-02-18 22:14:45,435+0000 INFO (periodic/2) [vdsm.api] START > > repoStats(domains=()) from=internal, > > task_id=2dd417e7-0f4f-4a09-a1af-725f267af135 (api:48) > > 2022-02-18 22:14:45,435+0000 INFO (periodic/2) [vdsm.api] FINISH repoStats > > return={} from=internal, task_id=2dd417e7-0f4f-4a09-a1af-725f267af135 > > (api:54) > > 2022-02-18 22:14:45,438+0000 WARN (periodic/2) [root] Failed to retrieve > > Hosted Engine HA info, is Hosted Engine setup finished? (api:194) > > 2022-02-18 22:14:45,447+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48840 > > (protocoldetector:61) > > 2022-02-18 22:14:45,449+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:46,082+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48842 > > (protocoldetector:61) > > 2022-02-18 22:14:46,084+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:46,452+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48844 > > (protocoldetector:61) > > 2022-02-18 22:14:46,455+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:47,087+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48846 > > (protocoldetector:61) > > 2022-02-18 22:14:47,089+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:47,457+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48848 > > (protocoldetector:61) > > 2022-02-18 22:14:47,459+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:48,092+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48850 > > (protocoldetector:61) > > 2022-02-18 22:14:48,094+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:48,461+0000 INFO (Reactor thread) > > [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48852 > > (protocoldetector:61) > > 2022-02-18 22:14:48,464+0000 ERROR (Reactor thread) > > [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: > > ::1 (sslutils:269) > > 2022-02-18 22:14:48,941+0000 INFO (vmrecovery) [vdsm.api] START > > getConnectedStoragePoolsList(options=None) from=internal, > > task_id=75ef5d5f-c56b-4595-95c8-3dc64caa3a83 (api:48) > > 2022-02-18 22:14:48,942+0000 INFO (vmrecovery) [vdsm.api] FINISH > > getConnectedStoragePoolsList return={'poollist': []} from=internal, > > task_id=75ef5d5f-c56b-4595-95c8-3dc64caa3a83 (api:54) > > > > > > > >> On Feb 18, 2022, at 16:35, Strahil Nikolov via Users <users@ovirt.org> > >> wrote: > >> > >> ovirt-2 is 'state=GlobalMaintenance' , but the other 2 nodes is uknown. > >> Try to start ovirt-ha-broker & ovirt-ha-agent > >> > >> Also, you may try to move the hosted-engine to ovirt-2 and try again > >> > >> > >> Best Regards, > >> Strahil Nikolov > >> > >> On Fri, Feb 18, 2022 at 21:48, Joseph Gelinas > >> <jos...@gelinas.cc> wrote: > >> I may be in maintenance mode, I did try to set it in the beginning of > >> this, but engine-setup doesn't see it. At this point my nodes say they > >> can't connect to the HA daemon, or have stale data. > >> > >> [root@ovirt-1 ~]# hosted-engine --set-maintenance --mode=global > >> Cannot connect to the HA daemon, please check the logs. > >> > >> [root@ovirt-3 ~]# hosted-engine --set-maintenance --mode=global > >> Cannot connect to the HA daemon, please check the logs. > >> > >> [root@ovirt-2 ~]# hosted-engine --set-maintenance --mode=global > >> [root@ovirt-2 ~]# hosted-engine --vm-status > >> > >> > >> !! Cluster is in GLOBAL MAINTENANCE mode !! > >> > >> > >> > >> --== Host ovirt-1.xxxxxx.com (id: 1) status ==-- > >> > >> Host ID : 1 > >> Host timestamp : 6750990 > >> Score : 0 > >> Engine status : unknown stale-data > >> Hostname : ovirt-1.xxxxxx.com > >> Local maintenance : False > >> stopped : True > >> crc32 : 5290657b > >> conf_on_shared_storage : True > >> local_conf_timestamp : 6750950 > >> Status up-to-date : False > >> Extra metadata (valid at timestamp): > >> metadata_parse_version=1 > >> metadata_feature_version=1 > >> timestamp=6750990 (Thu Feb 17 22:17:53 2022) > >> host-id=1 > >> score=0 > >> vm_conf_refresh_time=6750950 (Thu Feb 17 22:17:12 2022) > >> conf_on_shared_storage=True > >> maintenance=False > >> state=AgentStopped > >> stopped=True > >> > >> > >> --== Host ovirt-3.xxxxxx.com (id: 2) status ==-- > >> > >> Host ID : 2 > >> Host timestamp : 6731526 > >> Score : 0 > >> Engine status : unknown stale-data > >> Hostname : ovirt-3.xxxxxx.com > >> Local maintenance : False > >> stopped : True > >> crc32 : 12c6b5c9 > >> conf_on_shared_storage : True > >> local_conf_timestamp : 6731486 > >> Status up-to-date : False > >> Extra metadata (valid at timestamp): > >> metadata_parse_version=1 > >> metadata_feature_version=1 > >> timestamp=6731526 (Thu Feb 17 15:29:37 2022) > >> host-id=2 > >> score=0 > >> vm_conf_refresh_time=6731486 (Thu Feb 17 15:28:57 2022) > >> conf_on_shared_storage=True > >> maintenance=False > >> state=AgentStopped > >> stopped=True > >> > >> > >> --== Host ovirt-2.xxxxxx.com (id: 3) status ==-- > >> > >> Host ID : 3 > >> Host timestamp : 6829853 > >> Score : 3400 > >> Engine status : {"vm": "down", "health": "bad", > >> "detail": "unknown", "reason": "vm not running on this host"} > >> Hostname : ovirt-2.xxxxxx.com > >> Local maintenance : False > >> stopped : False > >> crc32 : 0779c0b8 > >> conf_on_shared_storage : True > >> local_conf_timestamp : 6829853 > >> Status up-to-date : True > >> Extra metadata (valid at timestamp): > >> metadata_parse_version=1 > >> metadata_feature_version=1 > >> timestamp=6829853 (Fri Feb 18 19:25:17 2022) > >> host-id=3 > >> score=3400 > >> vm_conf_refresh_time=6829853 (Fri Feb 18 19:25:17 2022) > >> conf_on_shared_storage=True > >> maintenance=False > >> state=GlobalMaintenance > >> stopped=False > >> > >> > >> !! Cluster is in GLOBAL MAINTENANCE mode !! > >> > >> > >> Ovirt-ha-agent on 1&3 just keeps trying to restart: > >> > >> MainThread::ERROR::2022-02-18 > >> 19:34:36,910::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> Trying to restart agent > >> MainThread::INFO::2022-02-18 > >> 19:34:36,910::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> Agent shutting down > >> MainThread::INFO::2022-02-18 > >> 19:34:47,268::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> ovirt-hosted-engine-ha agent 2.4.5 started > >> MainThread::INFO::2022-02-18 > >> 19:34:47,280::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) > >> Certificate common name not found, using hostname to identify host > >> MainThread::ERROR::2022-02-18 > >> 19:35:47,629::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> Traceback (most recent call last): > >> File > >>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>line 131, in _run_agent > >> return action(he) > >> File > >>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>line 55, in action_proper > >> return he.start_monitoring() > >> File > >>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> line 436, in start_monitoring > >> self._initialize_vdsm() > >> File > >>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> line 595, in _initialize_vdsm > >> logger=self._log > >> File > >>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line > >>472, in connect_vdsm_json_rpc > >> __vdsm_json_rpc_connect(logger, timeout) > >> File > >>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line > >>415, in __vdsm_json_rpc_connect > >> timeout=VDSM_MAX_RETRY * VDSM_DELAY > >> RuntimeError: Couldn't connect to VDSM within 60 seconds > >> > >> > >> Ovirt-2's ovirt-hosted-engine-ha/agent.log has entries detecting global > >> maintenance though `systemctl status ovirt-ha-agent` has python exception > >> errors from yesterday. > >> > >> MainThread::INFO::2022-02-18 > >> 19:39:10,452::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) > >> Global maintenance detected > >> MainThread::INFO::2022-02-18 > >> 19:39:10,524::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) > >> Current state GlobalMaintenance (score: 3400) > >> > >> > >> Feb 17 18:49:12 ovirt-2.us1.vricon.com python3[1324125]: detected > >> unhandled Python exception in > >> '/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py' > >> > >> > >> > >>> On Feb 18, 2022, at 14:20, Strahil Nikolov <hunter86...@yahoo.com> wrote: > >>> > >>> To set the engine into maintenance mode you can ssh to any Hypervisor and > >>> run: > >>> 'hosted-engine --set-maintenance --mode=global' > >>> wait 1 minute and run 'hosted-engine --vm-status' to validate. > >>> > >>> Best Regards, > >>> Strahil Nikolov > >>> > >>> On Fri, Feb 18, 2022 at 19:03, Joseph Gelinas > >>> <jos...@gelinas.cc> wrote: > >>> Hi, > >>> > >>> The certificates on our oVirt stack recently expired, while all the VMs > >>> are still up, I can't put the cluster into global maintenance via > >>> ovirt-engine, or do anything via ovirt-engine for that matter. Just get > >>> event logs about cert validity. > >>> > >>> VDSM ovirt-1.xxxxx.com command Get Host Capabilities failed: PKIX path > >>> validation failed: java.security.cert.CertPathValidatorException: > >>> validity check failed > >>> VDSM ovirt-2.xxxxx.com command Get Host Capabilities failed: PKIX path > >>> validation failed: java.security.cert.CertPathValidatorException: > >>> validity check failed > >>> VDSM ovirt-3.xxxxx.com command Get Host Capabilities failed: PKIX path > >>> validation failed: java.security.cert.CertPathValidatorException: > >>> validity check failed > >>> > >>> Under Compute -> Hosts, all are status Unassigned. Default data center is > >>> status Non Responsive. > >>> > >>> I have tried a couple of solutions to regenerate the certificates without > >>> much luck and have copied the originals back in place. > >>> > >>> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/upgrade_guide/replacing_sha-1_certificates_with_sha-256_certificates_4-1_local_db#Replacing_All_Signed_Certificates_with_SHA-256_4-1_local_db > >>> > >>> https://access.redhat.com/solutions/2409751 > >>> > >>> > >>> I have seen things saying running engine-setup will generate new certs, > >>> however engine doesn't think the cluster is in global maintenance so > >>> won't run that, I believe I can get around the check with `engine-setup > >>> --otopi-environment=OVESETUP_CONFIG/continueSetupOnHEVM=bool:True` but is > >>> that the right thing to do? Will it deploy the certs on to the hosts as > >>> well so things communicate properly? Looks like one is supposed to put a > >>> node into maintenance and reenroll it after doing the engine-setup, but > >>> will it even be able to put the nodes into maintenance given I can't do > >>> anything with them now? > >>> > >>> Appreciate any ideas. > >>> > >>> > >>> _______________________________________________ > >>> Users mailing list -- users@ovirt.org > >>> To unsubscribe send an email to users-le...@ovirt.org > >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>> oVirt Code of Conduct: > >>> https://www.ovirt.org/community/about/community-guidelines/ > >>> List Archives: > >>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QCFPKQ3OKPOUV266MFJUMVTNG2OHLJVW/ > >> _______________________________________________ > >> Users mailing list -- users@ovirt.org > >> To unsubscribe send an email to users-le...@ovirt.org > >> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >> oVirt Code of Conduct: > >> https://www.ovirt.org/community/about/community-guidelines/ > >> List Archives: > >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/XOQBFYM5W7SCJISJHQ7PZZ3VZWKY6GEZ/ > >> > >> _______________________________________________ > >> Users mailing list -- users@ovirt.org > >> To unsubscribe send an email to users-le...@ovirt.org > >> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >> oVirt Code of Conduct: > >> https://www.ovirt.org/community/about/community-guidelines/ > >> List Archives: > >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/NZE5DYLGQEFQ523MBSUDLKRSFIH7DP62/ > > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/477NW53FXLCUFGVHEA733FIXKCJ2ZNGN/ > _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TSBVZC37VMVHEE25NQFU7RFDUPMNOTL3/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OP2TEXM4C56UBAMHBAF4O45TRK75LIEI/