On Thu, Jan 25, 2018 at 5:12 PM, Christopher Cox <c...@endlessnow.com> wrote: > On 01/25/2018 02:25 PM, Douglas Landgraf wrote: >> >> On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox <c...@endlessnow.com> >> wrote: >>> >>> Would restarting vdsm on the node in question help fix this? Again, all >>> the >>> VMs are up on the node. Prior attempts to fix this problem have left the >>> node in a state where I can issue the "has been rebooted" command to it, >>> it's confused. >>> >>> So... node is up. All VMs are up. Can't issue "has been rebooted" to >>> the >>> node, all VMs show Unknown and not responding but they are up. >>> >>> Chaning the status is the ovirt db to 0 works for a second and then it >>> goes >>> immediately back to 8 (which is why I'm wondering if I should restart >>> vdsm >>> on the node). >> >> >> It's not recommended to change db manually. >> >>> >>> Oddly enough, we're running all of this in production. So, watching it >>> all >>> go down isn't the best option for us. >>> >>> Any advice is welcome. >> >> >> >> We would need to see the node/engine logs, have you found any error in >> the vdsm.log >> (from nodes) or engine.log? Could you please share the error? > > > > In short, the error is our ovirt manager lost network (our problem) and > crashed hard (hardware issue on the server).. On bring up, we had some > network changes (that caused the lost network problem) so our LACP bond was > down for a bit while we were trying to bring it up (noting the ovirt manager > is up while we're reestablishing the network on the switch side). > > In other word, that's the "error" so to speak that got us to where we are. > > Full DEBUG enabled on the logs... The error messages seem obvious to me.. > starts like this (nothing the ISO DOMAIN was coming off an NFS mount off the > ovirt management server... yes... we know... we do have plans to move that). > > So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz): > > (hopefully no surprise here) > > Thread-2426633::WARNING::2018-01-23 > 13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could not > collect metadata file for domain path > /rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844 > Traceback (most recent call last): > File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles > sd.DOMAIN_META_DATA)) > File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob > return self._iop.glob(pattern) > File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 536, > in glob > return self._sendCommand("glob", {"pattern": pattern}, self.timeout) > File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 421, > in _sendCommand > raise Timeout(os.strerror(errno.ETIMEDOUT)) > Timeout: Connection timed out > Thread-27::ERROR::2018-01-23 > 13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain > e5ecae2f-5a06-4743-9a43-e74d83992c35 not found > Traceback (most recent call last): > File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain > dom = findMethod(sdUUID) > File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain > return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) > File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath > raise se.StorageDomainDoesNotExist(sdUUID) > StorageDomainDoesNotExist: Storage domain does not exist: > (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',) > Thread-27::ERROR::2018-01-23 > 13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error > monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35 > Traceback (most recent call last): > File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain > self._performDomainSelftest() > File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in > wrapper > value = meth(self, *a, **kw) > File "/usr/share/vdsm/storage/monitor.py", line 339, in > _performDomainSelftest > self.domain.selftest() > File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ > return getattr(self.getRealDomain(), attrName) > File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain > return self._cache._realProduce(self._sdUUID) > File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce > domain = self._findDomain(sdUUID) > File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain > dom = findMethod(sdUUID) > File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain > return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) > File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath > raise se.StorageDomainDoesNotExist(sdUUID) > StorageDomainDoesNotExist: Storage domain does not exist: > (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',) > > > Again, all the hypervisor nodes will complain about having the NFS area for > ISO DOMAIN now gone. Remember the ovirt manager node held this and it has > now network has gone out and the node crashed (note: the ovirt node (the > actual server box) shouldn't crash due to the network outage, but it did.
I have added VDSM people in this thread to review it. I am assuming the network changes (during the crash) still make the storage domain available for the nodes. > > So here is the engine collapse as it lost network connectivity (before the > server actually crashed hard). > > 2018-01-23 13:45:33,666 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-87) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VDSM d0lppn067 command failed: Heartbeat > exeeded > 2018-01-23 13:45:33,666 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-10) [21574461] Correlation ID: null, Call > Stack: null, Custom Event ID: -1, Message: VDSM d0lppn072 command failed: > Heartbeat exeeded > 2018-01-23 13:45:33,666 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Correlation ID: null, Call > Stack: null, Custom Event ID: -1, Message: VDSM d0lppn066 command failed: > Heartbeat exeeded > 2018-01-23 13:45:33,667 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] > (DefaultQuartzScheduler_Worker-87) [] Command 'GetStatsVDSCommand(HostName = > d0lppn067, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', > hostId='f99c68c8-b0e8-437b-8cd9-ebaddaaede96', > vds='Host[d0lppn067,f99c68c8-b0e8-437b-8cd9-ebaddaaede96]'})' execution > failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,667 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] > (DefaultQuartzScheduler_Worker-10) [21574461] Command > 'GetStatsVDSCommand(HostName = d0lppn072, > VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', > hostId='fdc00296-973d-4268-bd79-6dac535974e0', > vds='Host[d0lppn072,fdc00296-973d-4268-bd79-6dac535974e0]'})' execution > failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,667 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] > (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Command > 'GetStatsVDSCommand(HostName = d0lppn066, > VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', > hostId='14abf559-4b62-4ebd-a345-77fa9e1fa3ae', > vds='Host[d0lppn066,14abf559-4b62-4ebd-a345-77fa9e1fa3ae]'})' execution > failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,669 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-87) [] Failed getting vds stats, > vds='d0lppn067'(f99c68c8-b0e8-437b-8cd9-ebaddaaede96): > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: > VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,669 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-10) [21574461] Failed getting vds stats, > vds='d0lppn072'(fdc00296-973d-4268-bd79-6dac535974e0): > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: > VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,669 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Failed getting vds stats, > vds='d0lppn066'(14abf559-4b62-4ebd-a345-77fa9e1fa3ae): > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: > VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,671 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-10) [21574461] Failure to refresh Vds runtime > info: VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,671 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Failure to refresh Vds runtime > info: VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,671 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-87) [] Failure to refresh Vds runtime info: > VDSGenericException: VDSNetworkException: Heartbeat exeeded > 2018-01-23 13:45:33,671 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Exception: > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: > VDSGenericException: VDSNetworkException: Heartbeat exeeded > at > org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) > [vdsbroker.jar:] > at > org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) > [dal.jar:] > at > org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:114) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227) > [vdsbroker.jar:] > at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) > [:1.8.0_102] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [rt.jar:1.8.0_102] > at java.lang.reflect.Method.invoke(Method.java:498) > [rt.jar:1.8.0_102] > at > org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) > [scheduler.jar:] > at > org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) > [scheduler.jar:] > at org.quartz.core.JobRunShell.run(JobRunShell.java:213) > [quartz.jar:] > at > org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) > [quartz.jar:] > > 2018-01-23 13:45:33,671 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-10) [21574461] Exception: > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: > VDSGenericException: VDSNetworkException: Heartbeat exeeded > at > org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) > [vdsbroker.jar:] > at > org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) > [dal.jar:] > at > org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:114) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227) > [vdsbroker.jar:] > at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) > [:1.8.0_102] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [rt.jar:1.8.0_102] > at java.lang.reflect.Method.invoke(Method.java:498) > [rt.jar:1.8.0_102] > at > org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) > [scheduler.jar:] > at > org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) > [scheduler.jar:] > at org.quartz.core.JobRunShell.run(JobRunShell.java:213) > [quartz.jar:] > at > org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) > [quartz.jar:] > > 2018-01-23 13:45:33,671 ERROR > [org.ovirt.engine.core.vdsbroker.HostMonitoring] > (DefaultQuartzScheduler_Worker-87) [] Exception: > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: > VDSGenericException: VDSNetworkException: Heartbeat exeeded > at > org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) > [vdsbroker.jar:] > at > org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) > [dal.jar:] > at > org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) > [vdsbroker.jar:] > at > org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472) > [vdsbroker.jar:] > > > > > Here are the engine logs show problem with node d0lppn065, the VMs first go > to "Unknown" then then "Unknown" plus "not responding": > > 2018-01-23 14:48:00,712 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (org.ovirt.thread.pool-8-thread-28) [] Correlation ID: null, Call Stack: > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: > org.ovirt.vdsm.jsonrpc.client.ClientConnection > Exception: Connection failed > at > org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157) > at > org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120) > at > org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) > at > org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) > at > org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) > at > org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher.fetch(VmsStatisticsFetcher.java:27) > at > org.ovirt.engine.core.vdsbroker.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:35) > at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) > at > org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) > at org.quartz.core.JobRunShell.run(JobRunShell.java:213) > at > org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) > Caused by: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: > Connection failed > at > org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:155) > at > org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:134) > at > org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:81) > at > org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70) > at > org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getAllVmStats(JsonRpcVdsServer.java:331) > at > org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand.executeVdsBrokerCommand(GetAllVmStatsVDSCommand.java:20) > at > org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) > ... 12 more > , Custom Event ID: -1, Message: Host d0lppn065 is non responsive. > 2018-01-23 14:48:00,713 INFO [org.ovirt.engine.core.bll.VdsEventListener] > (org.ovirt.thread.pool-8-thread-1) [] ResourceManager::vdsNotResponding > entered for Host '2797cae7-6886-4898-a5e4-23361ce03a90', '10.32.0.65' > 2018-01-23 14:48:00,713 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM vtop3 was set to the Unknown status. > > ...etc... (sorry about the wraps below) > > 2018-01-23 14:59:07,817 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '30f7af86-c2b9-41c3-b2c5-49f5bbdd0e27'(d0lpvd070) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:07,819 INFO > [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] > (DefaultQuartzScheduler_Worker-74) [] Fetched 15 VMs from VDS > '8cb119c5-b7f0-48a3-970a-205d96b2e940' > 2018-01-23 14:59:07,936 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpvd070 is not responding. > 2018-01-23 14:59:07,939 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > 'ebc5bb82-b985-451b-8313-827b5f40eaf3'(d0lpvd039) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,032 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpvd039 is not responding. > 2018-01-23 14:59:08,038 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '494c4f9e-1616-476a-8f66-a26a96b76e56'(vtop3) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,134 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM vtop3 is not responding. > 2018-01-23 14:59:08,136 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > 'eaeaf73c-d9e2-426e-a2f2-7fcf085137b0'(d0lpvw059) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,237 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpvw059 is not responding. > 2018-01-23 14:59:08,239 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '8308a547-37a1-4163-8170-f89b6dc85ba8'(d0lpvm058) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,326 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpvm058 is not responding. > 2018-01-23 14:59:08,328 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '3d544926-3326-44e1-8b2a-ec632f51112a'(d0lqva056) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,400 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lqva056 is not responding. > 2018-01-23 14:59:08,402 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '989e5a17-789d-4eba-8a5e-f74846128842'(d0lpva078) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,472 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpva078 is not responding. > 2018-01-23 14:59:08,474 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '050a71c1-9e65-43c6-bdb2-18eba571e2eb'(d0lpvw077) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,545 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpvw077 is not responding. > 2018-01-23 14:59:08,547 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > 'c3b497fd-6181-4dd1-9acf-8e32f981f769'(d0lpva079) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,621 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpva079 is not responding. > 2018-01-23 14:59:08,623 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '7cd22b39-feb1-4c6e-8643-ac8fb0578842'(d0lqva034) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,690 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lqva034 is not responding. > 2018-01-23 14:59:08,692 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '2ab9b1d8-d1e8-4071-a47c-294e586d2fb6'(d0lpvd038) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,763 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lpvd038 is not responding. > 2018-01-23 14:59:08,768 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > 'ecb4e795-9eeb-4cdc-a356-c1b9b32af5aa'(d0lqva031) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,836 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lqva031 is not responding. > 2018-01-23 14:59:08,838 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '1a361727-1607-43d9-bd22-34d45b386d3e'(d0lqva033) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,911 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM d0lqva033 is not responding. > 2018-01-23 14:59:08,913 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] > (DefaultQuartzScheduler_Worker-75) [] VM > '0cd65f90-719e-429e-a845-f425612d7b14'(vtop4) moved from 'Up' --> > 'NotResponding' > 2018-01-23 14:59:08,984 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM vtop4 is not responding. > >> >> Probably it's time to think to upgrade your environment from 3.6. > > > I know. But from a production standpoint mid-2016 wasn't that long ago. > And 4 was just coming out of beta at the time. > > We were upgrading from 3.4 to 3.6. And it took a long time (again, because > it's all "live"). Trust me, the move to 4.0 was discussed, it was just a > timing thing. > > With that said, I do "hear you"....and certainly it's being discussed. We > just don't see a "good" migration path... we see a slow path (moving nodes > out, upgrading, etc.) and knowing that as with all things, nobody can > guarantee "success", which would be a very bad thing. So going from working > 3.6 to totally (potential) broken 4.2, isn't going to impress anyone here, > you know? If all goes according to our best guesses, then great, but when > things go bad, and the chance is not insignificant, well... I'm just not > quite prepared with my résumé if you know what I mean. > > Don't get me wrong, our move from 3.4 to 3.6 had some similar risks, but we > also migrated to whole new infrastructure, a luxury we will not have this > time. And somehow 3.4 to 3.6 doesn't sound as risky as 3.6 to 4.2. I see your concern. However, keep your system updated with recent software is something I would recommend. You could setup a parallel 4.2 env and move the VMS slowly from 3.6. > > Is there a path from oVirt to RHEV? Every bit of help we get helps us in > making that decision as well, which I think would be a very good thing for > both of us. (I inherited all this oVirt and I was the "guy" doing the 3.4 to > 3.6 with the all new infrastructure). Yes, you can import your setup to RHEV. -- Cheers Douglas _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users