Can you take the LB out as a variable - and try directly doing things inside the network (behind VPN, via internal IPs)
The exception thrown in the SSVM is strange though.... Best, On Sat, 10 Jul 2021 at 11:35, Joshua Schaeffer <jschaef...@harmonywave.com> wrote: > ACS: 4.15.0.0 > Hypervisor: KVM > OS: Ubuntu 20.04 > > In troubleshooting my last secondary storage issue I came across a > repeating warning and error on the secondary storage VM and the compute > host respectively. This is causing some issues and I'd like to know if > anybody else has come across this or how to resolve it. > > Exactly every 60 seconds I receive a lost connection info message followed > by a reconnection attempt (which is always successful) which is then > followed by a warning in the SSVM's case and an error in the compute host's > case. The warning is an exception that is caught by > java.lang.NumberFormatException and the error is a libvirt error. This also > triggers an alert in the CloudStack UI. Here are the details. > > First the logs from /var/log/cloud.log inside the SSVM: > > 2021-07-10 08:48:57,919 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Lost connection to host: bllcloudlb01.harmonywave.cloud. Attempting > reconnection while we still have 0 commands in progress. > 2021-07-10 08:48:57,945 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > NioClient connection closed > 2021-07-10 08:48:57,947 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Reconnecting to host:bllcloudlb01.harmonywave.cloud > 2021-07-10 08:48:57,948 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > Connecting to bllcloudlb01.harmonywave.cloud:8250 > 2021-07-10 08:48:57,954 INFO [utils.nio.Link] (Agent-Handler-2:null) Conf > file found: /usr/local/cloud/systemvm/conf/agent.properties > 2021-07-10 08:48:58,664 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > SSL: Handshake done > 2021-07-10 08:48:58,665 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > Connected to bllcloudlb01.harmonywave.cloud:8250 > 2021-07-10 08:48:58,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Proccess agent startup answer, agent id = 0 > 2021-07-10 08:48:58,731 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Set agent id 0 > 2021-07-10 08:48:58,736 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Startup Response Received: agent id = 0 > 2021-07-10 08:48:59,128 INFO > [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Determined host > bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6 > 2021-07-10 08:48:59,132 INFO > [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Determined host > bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6 > 2021-07-10 08:49:02,799 INFO > [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) created a thread to start post upload server > 2021-07-10 08:49:02,807 INFO [handler.logging.LoggingHandler] > (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] REGISTERED > 2021-07-10 08:49:02,808 INFO [handler.logging.LoggingHandler] > (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] BIND(0.0.0.0/0.0.0.0:8210) > 2021-07-10 08:49:02,808 INFO [handler.logging.LoggingHandler] > (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] CLOSE() > 2021-07-10 08:49:02,809 INFO [handler.logging.LoggingHandler] > (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] UNREGISTERED > 2021-07-10 08:49:02,816 INFO > [storage.resource.NfsSecondaryStorageResource] (Thread-157:null) shutting > down post upload server > 2021-07-10 08:49:03,021 INFO > [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-1:null) Determined host > bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6 > 2021-07-10 08:49:03,072 INFO [storage.template.DownloadManagerImpl] > (agentRequest-Handler-1:null) found 0 volumes[] > 2021-07-10 08:49:03,127 INFO > [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-5:null) Determined host > bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6 > 2021-07-10 08:49:03,257 INFO [storage.template.DownloadManagerImpl] > (agentRequest-Handler-5:null) found 3 > templates[/mnt/SecStorage/df2ca46d-aee0-302e-9ad0-2e94252341e4/template/tmpl/1/3/21b64e43-bd2e-447e-a488-14b67205e289.qcow2, > /mnt/SecStorage/df2ca46d-aee0-302e-9ad0-2e94252341e4/template/tmpl/1/10/746d45d3-426c-4d86-b82c-1d4cd5a25a8c.qcow2, > /mnt/SecStorage/df2ca46d-aee0-302e-9ad0-2e94252341e4/template/tmpl/2/230/50035329-d6d9-491d-9aef-e2756bdd02ec.iso] > 2021-07-10 08:49:03,269 WARN [storage.template.TemplateLocation] > (agentRequest-Handler-5:null) Cleaning up inconsistent information for QCOW2 > 2021-07-10 08:49:03,271 WARN [cloud.agent.Agent] > (agentRequest-Handler-5:null) Caught: > java.lang.NumberFormatException: For input string: "" > at > java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.base/java.lang.Long.parseLong(Long.java:702) > at java.base/java.lang.Long.parseLong(Long.java:817) > at > com.cloud.storage.template.TemplateLocation.getTemplateInfo(TemplateLocation.java:180) > at > org.apache.cloudstack.storage.template.DownloadManagerImpl.gatherTemplateInfo(DownloadManagerImpl.java:905) > at > org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.execute(NfsSecondaryStorageResource.java:2145) > at > org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.executeRequest(NfsSecondaryStorageResource.java:298) > at > com.cloud.storage.resource.PremiumSecondaryStorageResource.defaultAction(PremiumSecondaryStorageResource.java:64) > at > com.cloud.storage.resource.PremiumSecondaryStorageResource.executeRequest(PremiumSecondaryStorageResource.java:60) > at com.cloud.agent.Agent.processRequest(Agent.java:661) > at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1079) > at com.cloud.utils.nio.Task.call(Task.java:83) > at com.cloud.utils.nio.Task.call(Task.java:29) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2021-07-10 08:49:03,316 INFO [cloud.agent.Agent] > (agentRequest-Handler-2:null) Processing agent ready command, agent id = 16 > 2021-07-10 08:49:03,317 INFO [cloud.agent.Agent] > (agentRequest-Handler-2:null) Set agent id 16 > 2021-07-10 08:49:03,318 INFO [cloud.agent.Agent] > (agentRequest-Handler-2:null) Ready command is processed for agent id = 16 > 2021-07-10 08:49:03,482 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) Processing agent ready command, agent id = 16 > 2021-07-10 08:49:03,482 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) Set agent id 16 > 2021-07-10 08:49:03,506 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) Processed new management server list: > bllcloudlb01.harmonywave.cloud@static > 2021-07-10 08:49:03,506 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) Ready command is processed for agent id = 16 > 2021-07-10 08:49:03,670 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Connected to the host: bllcloudlb01.harmonywave.cloud > > I also see an alert in the CloudStack UI that seems to correspond to the > warning: > > Name: ALERT.COMPUTE.HOST > ID: ea6acb78-3635-4e7f-acb9-ea061154f112 > Type: 7 > Date: 10 Jul 2021 08:48:57 > Description: Host in ALERT state, name: s-41-VM (id:16), availability > zone: BL1, pod: POD1 > > Secondly, the logs from /var/log/cloudstack/agent/agent.log: > > 2021-07-10 09:04:00,735 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > (logid:29f535a2) Lost connection to host: bllcloudlb01.harmonywave.cloud. > Attempting reconnection while we still have 0 commands in progress. > 2021-07-10 09:04:00,739 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > (logid:29f535a2) NioClient connection closed > 2021-07-10 09:04:00,739 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > (logid:29f535a2) Reconnecting to host:bllcloudlb01.harmonywave.cloud > 2021-07-10 09:04:00,739 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > (logid:29f535a2) Connecting to bllcloudlb01.harmonywave.cloud:8250 > 2021-07-10 09:04:00,743 INFO [utils.nio.Link] (Agent-Handler-2:null) > (logid:29f535a2) Conf file found: /etc/cloudstack/agent/agent.properties > 2021-07-10 09:04:01,249 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > (logid:29f535a2) SSL: Handshake done > 2021-07-10 09:04:01,250 INFO [utils.nio.NioClient] (Agent-Handler-2:null) > (logid:29f535a2) Connected to bllcloudlb01.harmonywave.cloud:8250 > 2021-07-10 09:04:01,256 INFO [utils.linux.KVMHostInfo] > (Agent-Handler-1:null) (logid:29f535a2) Could not read cpuinfo_max_freq, > falling back on libvirt > 2021-07-10 09:04:01,360 INFO [kvm.storage.LibvirtStorageAdaptor] > (Agent-Handler-1:null) (logid:29f535a2) Attempting to create storage pool > 6b8fc73b-c066-45ad-bc88-7bb4018bf32f (Filesystem) in libvirt > 2021-07-10 09:04:01,361 ERROR [kvm.resource.LibvirtConnection] > (Agent-Handler-1:null) (logid:29f535a2) Connection with libvirtd is broken: > invalid connection pointer in virConnectGetVersion > 2021-07-10 09:04:01,364 INFO [kvm.storage.LibvirtStorageAdaptor] > (Agent-Handler-1:null) (logid:29f535a2) Found existing defined storage pool > 6b8fc73b-c066-45ad-bc88-7bb4018bf32f, using it. > 2021-07-10 09:04:01,365 INFO [kvm.storage.LibvirtStorageAdaptor] > (Agent-Handler-1:null) (logid:29f535a2) Trying to fetch storage pool > 6b8fc73b-c066-45ad-bc88-7bb4018bf32f from libvirt > 2021-07-10 09:04:01,396 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > (logid:29f535a2) Proccess agent startup answer, agent id = 0 > 2021-07-10 09:04:01,396 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > (logid:29f535a2) Set agent id 0 > 2021-07-10 09:04:01,397 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > (logid:29f535a2) Startup Response Received: agent id = 0 > 2021-07-10 09:04:01,749 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-3:null) (logid:629af4a9) Attempting to create storage > pool 341bae0e-fd14-39f6-80a7-e964b5afb28c (NetworkFilesystem) in libvirt > 2021-07-10 09:04:01,752 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-3:null) (logid:629af4a9) Found existing defined > storage pool 341bae0e-fd14-39f6-80a7-e964b5afb28c, using it. > 2021-07-10 09:04:01,752 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-3:null) (logid:629af4a9) Trying to fetch storage pool > 341bae0e-fd14-39f6-80a7-e964b5afb28c from libvirt > 2021-07-10 09:04:02,446 INFO [cloud.agent.Agent] > (agentRequest-Handler-3:null) (logid:629af4a9) Processing agent ready > command, agent id = 13 > 2021-07-10 09:04:02,447 INFO [cloud.agent.Agent] > (agentRequest-Handler-3:null) (logid:629af4a9) Set agent id 13 > 2021-07-10 09:04:02,447 INFO [cloud.agent.Agent] > (agentRequest-Handler-3:null) (logid:629af4a9) Ready command is processed > for agent id = 13 > 2021-07-10 09:04:02,569 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) (logid:629af4a9) Processing agent ready > command, agent id = 13 > 2021-07-10 09:04:02,570 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) (logid:629af4a9) Set agent id 13 > 2021-07-10 09:04:02,571 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) (logid:629af4a9) Processed new management > server list: bllcloudlb01.harmonywave.cloud@static > 2021-07-10 09:04:02,571 INFO [cloud.agent.Agent] > (agentRequest-Handler-4:null) (logid:629af4a9) Ready command is processed > for agent id = 13 > 2021-07-10 09:04:06,251 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > (logid:29f535a2) Connected to the host: bllcloudlb01.harmonywave.cloud > > It also throws an alert in the UI: > > Name: ALERT.COMPUTE.HOST > ID: 7c83e1cc-aefa-4698-90c8-7c616fb3628a > Type: 7 > Date: 10 Jul 2021 09:04:00 > Description: Host disconnected, name: bllcloudcmp02 (id:13), availability > zone: BL1, pod: POD1 > > All of the logs above repeat every 60 seconds. When I was trying to upload > my ISO the upload process would fail if it ever coincided with the warning > and/or error message. So if the ISO upload took less than 1 minute and was > not occurring at the top of the minute then it would successfully upload. > However, if it was trying to upload when this warning/error is thrown the > progress bar would just stop and a network error message was thrown in the > UI. > > I should note that it is losing connection to my load balancer which is in > front of two management servers. Could this be a setting at the load > balancer level that is closing the connections? > > I found a couple links related to the libvirt error [1] [2] but for the > most part they seem to say that the message can be ignored or the > discussions are not actually centered around the libvirt error. Libvirt is > in fact running on the compute host and the systemvm's are running on top > of it. They both show a "state" of "running" and the "agent status" of "up" > in the UI. Does anybody know why the SSVM and the compute host would > constantly disconnect and reconnect? I'm happy to provide any additional > information. Any help is appreciated. > > [1] (https://github.com/apache/cloudstack/issues/4700) > [2] (https://github.com/apache/cloudstack/issues/3509) > > -- > Thanks, > Joshua Schaeffer > > -- Andrija Panić