Can you take the LB out as a variable - and try directly doing things
inside the network (behind VPN, via internal IPs)

The exception thrown in the SSVM is strange though....

Best,

On Sat, 10 Jul 2021 at 11:35, Joshua Schaeffer <jschaef...@harmonywave.com>
wrote:

> ACS: 4.15.0.0
> Hypervisor: KVM
> OS: Ubuntu 20.04
>
> In troubleshooting my last secondary storage issue I came across a
> repeating warning and error on the secondary storage VM and the compute
> host respectively. This is causing some issues and I'd like to know if
> anybody else has come across this or how to resolve it.
>
> Exactly every 60 seconds I receive a lost connection info message followed
> by a reconnection attempt (which is always successful) which is then
> followed by a warning in the SSVM's case and an error in the compute host's
> case. The warning is an exception that is caught by
> java.lang.NumberFormatException and the error is a libvirt error. This also
> triggers an alert in the CloudStack UI. Here are the details.
>
> First the logs from /var/log/cloud.log inside the SSVM:
>
> 2021-07-10 08:48:57,919 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Lost connection to host: bllcloudlb01.harmonywave.cloud. Attempting
> reconnection while we still have 0 commands in progress.
> 2021-07-10 08:48:57,945 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> NioClient connection closed
> 2021-07-10 08:48:57,947 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Reconnecting to host:bllcloudlb01.harmonywave.cloud
> 2021-07-10 08:48:57,948 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> Connecting to bllcloudlb01.harmonywave.cloud:8250
> 2021-07-10 08:48:57,954 INFO  [utils.nio.Link] (Agent-Handler-2:null) Conf
> file found: /usr/local/cloud/systemvm/conf/agent.properties
> 2021-07-10 08:48:58,664 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> SSL: Handshake done
> 2021-07-10 08:48:58,665 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> Connected to bllcloudlb01.harmonywave.cloud:8250
> 2021-07-10 08:48:58,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Proccess agent startup answer, agent id = 0
> 2021-07-10 08:48:58,731 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Set agent id 0
> 2021-07-10 08:48:58,736 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Startup Response Received: agent id = 0
> 2021-07-10 08:48:59,128 INFO
> [storage.resource.NfsSecondaryStorageResource]
> (agentRequest-Handler-2:null) Determined host
> bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6
> 2021-07-10 08:48:59,132 INFO
> [storage.resource.NfsSecondaryStorageResource]
> (agentRequest-Handler-2:null) Determined host
> bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6
> 2021-07-10 08:49:02,799 INFO
> [storage.resource.NfsSecondaryStorageResource]
> (agentRequest-Handler-2:null) created a thread to start post upload server
> 2021-07-10 08:49:02,807 INFO  [handler.logging.LoggingHandler]
> (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] REGISTERED
> 2021-07-10 08:49:02,808 INFO  [handler.logging.LoggingHandler]
> (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] BIND(0.0.0.0/0.0.0.0:8210)
> 2021-07-10 08:49:02,808 INFO  [handler.logging.LoggingHandler]
> (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] CLOSE()
> 2021-07-10 08:49:02,809 INFO  [handler.logging.LoggingHandler]
> (nioEventLoopGroup-310-1:null) [id: 0xab44bf4e] UNREGISTERED
> 2021-07-10 08:49:02,816 INFO
> [storage.resource.NfsSecondaryStorageResource] (Thread-157:null) shutting
> down post upload server
> 2021-07-10 08:49:03,021 INFO
> [storage.resource.NfsSecondaryStorageResource]
> (agentRequest-Handler-1:null) Determined host
> bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6
> 2021-07-10 08:49:03,072 INFO  [storage.template.DownloadManagerImpl]
> (agentRequest-Handler-1:null) found 0 volumes[]
> 2021-07-10 08:49:03,127 INFO
> [storage.resource.NfsSecondaryStorageResource]
> (agentRequest-Handler-5:null) Determined host
> bllcloudstg02-storage.harmonywave.cloud corresponds to IP 10.2.96.6
> 2021-07-10 08:49:03,257 INFO  [storage.template.DownloadManagerImpl]
> (agentRequest-Handler-5:null) found 3
> templates[/mnt/SecStorage/df2ca46d-aee0-302e-9ad0-2e94252341e4/template/tmpl/1/3/21b64e43-bd2e-447e-a488-14b67205e289.qcow2,
> /mnt/SecStorage/df2ca46d-aee0-302e-9ad0-2e94252341e4/template/tmpl/1/10/746d45d3-426c-4d86-b82c-1d4cd5a25a8c.qcow2,
> /mnt/SecStorage/df2ca46d-aee0-302e-9ad0-2e94252341e4/template/tmpl/2/230/50035329-d6d9-491d-9aef-e2756bdd02ec.iso]
> 2021-07-10 08:49:03,269 WARN  [storage.template.TemplateLocation]
> (agentRequest-Handler-5:null) Cleaning up inconsistent information for QCOW2
> 2021-07-10 08:49:03,271 WARN  [cloud.agent.Agent]
> (agentRequest-Handler-5:null) Caught:
> java.lang.NumberFormatException: For input string: ""
>     at
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>     at java.base/java.lang.Long.parseLong(Long.java:702)
>     at java.base/java.lang.Long.parseLong(Long.java:817)
>     at
> com.cloud.storage.template.TemplateLocation.getTemplateInfo(TemplateLocation.java:180)
>     at
> org.apache.cloudstack.storage.template.DownloadManagerImpl.gatherTemplateInfo(DownloadManagerImpl.java:905)
>     at
> org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.execute(NfsSecondaryStorageResource.java:2145)
>     at
> org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.executeRequest(NfsSecondaryStorageResource.java:298)
>     at
> com.cloud.storage.resource.PremiumSecondaryStorageResource.defaultAction(PremiumSecondaryStorageResource.java:64)
>     at
> com.cloud.storage.resource.PremiumSecondaryStorageResource.executeRequest(PremiumSecondaryStorageResource.java:60)
>     at com.cloud.agent.Agent.processRequest(Agent.java:661)
>     at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1079)
>     at com.cloud.utils.nio.Task.call(Task.java:83)
>     at com.cloud.utils.nio.Task.call(Task.java:29)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> 2021-07-10 08:49:03,316 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-2:null) Processing agent ready command, agent id = 16
> 2021-07-10 08:49:03,317 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-2:null) Set agent id 16
> 2021-07-10 08:49:03,318 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-2:null) Ready command is processed for agent id = 16
> 2021-07-10 08:49:03,482 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Processing agent ready command, agent id = 16
> 2021-07-10 08:49:03,482 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Set agent id 16
> 2021-07-10 08:49:03,506 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Processed new management server list:
> bllcloudlb01.harmonywave.cloud@static
> 2021-07-10 08:49:03,506 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Ready command is processed for agent id = 16
> 2021-07-10 08:49:03,670 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Connected to the host: bllcloudlb01.harmonywave.cloud
>
> I also see an alert in the CloudStack UI that seems to correspond to the
> warning:
>
> Name: ALERT.COMPUTE.HOST
> ID: ea6acb78-3635-4e7f-acb9-ea061154f112
> Type: 7
> Date: 10 Jul 2021 08:48:57
> Description: Host in ALERT state, name: s-41-VM (id:16), availability
> zone: BL1, pod: POD1
>
> Secondly, the logs from /var/log/cloudstack/agent/agent.log:
>
> 2021-07-10 09:04:00,735 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> (logid:29f535a2) Lost connection to host: bllcloudlb01.harmonywave.cloud.
> Attempting reconnection while we still have 0 commands in progress.
> 2021-07-10 09:04:00,739 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> (logid:29f535a2) NioClient connection closed
> 2021-07-10 09:04:00,739 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> (logid:29f535a2) Reconnecting to host:bllcloudlb01.harmonywave.cloud
> 2021-07-10 09:04:00,739 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> (logid:29f535a2) Connecting to bllcloudlb01.harmonywave.cloud:8250
> 2021-07-10 09:04:00,743 INFO  [utils.nio.Link] (Agent-Handler-2:null)
> (logid:29f535a2) Conf file found: /etc/cloudstack/agent/agent.properties
> 2021-07-10 09:04:01,249 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> (logid:29f535a2) SSL: Handshake done
> 2021-07-10 09:04:01,250 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
> (logid:29f535a2) Connected to bllcloudlb01.harmonywave.cloud:8250
> 2021-07-10 09:04:01,256 INFO  [utils.linux.KVMHostInfo]
> (Agent-Handler-1:null) (logid:29f535a2) Could not read cpuinfo_max_freq,
> falling back on libvirt
> 2021-07-10 09:04:01,360 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (Agent-Handler-1:null) (logid:29f535a2) Attempting to create storage pool
> 6b8fc73b-c066-45ad-bc88-7bb4018bf32f (Filesystem) in libvirt
> 2021-07-10 09:04:01,361 ERROR [kvm.resource.LibvirtConnection]
> (Agent-Handler-1:null) (logid:29f535a2) Connection with libvirtd is broken:
> invalid connection pointer in virConnectGetVersion
> 2021-07-10 09:04:01,364 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (Agent-Handler-1:null) (logid:29f535a2) Found existing defined storage pool
> 6b8fc73b-c066-45ad-bc88-7bb4018bf32f, using it.
> 2021-07-10 09:04:01,365 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (Agent-Handler-1:null) (logid:29f535a2) Trying to fetch storage pool
> 6b8fc73b-c066-45ad-bc88-7bb4018bf32f from libvirt
> 2021-07-10 09:04:01,396 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> (logid:29f535a2) Proccess agent startup answer, agent id = 0
> 2021-07-10 09:04:01,396 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> (logid:29f535a2) Set agent id 0
> 2021-07-10 09:04:01,397 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> (logid:29f535a2) Startup Response Received: agent id = 0
> 2021-07-10 09:04:01,749 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:629af4a9) Attempting to create storage
> pool 341bae0e-fd14-39f6-80a7-e964b5afb28c (NetworkFilesystem) in libvirt
> 2021-07-10 09:04:01,752 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:629af4a9) Found existing defined
> storage pool 341bae0e-fd14-39f6-80a7-e964b5afb28c, using it.
> 2021-07-10 09:04:01,752 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:629af4a9) Trying to fetch storage pool
> 341bae0e-fd14-39f6-80a7-e964b5afb28c from libvirt
> 2021-07-10 09:04:02,446 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-3:null) (logid:629af4a9) Processing agent ready
> command, agent id = 13
> 2021-07-10 09:04:02,447 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-3:null) (logid:629af4a9) Set agent id 13
> 2021-07-10 09:04:02,447 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-3:null) (logid:629af4a9) Ready command is processed
> for agent id = 13
> 2021-07-10 09:04:02,569 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) (logid:629af4a9) Processing agent ready
> command, agent id = 13
> 2021-07-10 09:04:02,570 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) (logid:629af4a9) Set agent id 13
> 2021-07-10 09:04:02,571 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) (logid:629af4a9) Processed new management
> server list: bllcloudlb01.harmonywave.cloud@static
> 2021-07-10 09:04:02,571 INFO  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) (logid:629af4a9) Ready command is processed
> for agent id = 13
> 2021-07-10 09:04:06,251 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> (logid:29f535a2) Connected to the host: bllcloudlb01.harmonywave.cloud
>
> It also throws an alert in the UI:
>
> Name: ALERT.COMPUTE.HOST
> ID: 7c83e1cc-aefa-4698-90c8-7c616fb3628a
> Type: 7
> Date: 10 Jul 2021 09:04:00
> Description: Host disconnected, name: bllcloudcmp02 (id:13), availability
> zone: BL1, pod: POD1
>
> All of the logs above repeat every 60 seconds. When I was trying to upload
> my ISO the upload process would fail if it ever coincided with the warning
> and/or error message. So if the ISO upload took less than 1 minute and was
> not occurring at the top of the minute then it would successfully upload.
> However, if it was trying to upload when this warning/error is thrown the
> progress bar would just stop and a network error message was thrown in the
> UI.
>
> I should note that it is losing connection to my load balancer which is in
> front of two management servers. Could this be a setting at the load
> balancer level that is closing the connections?
>
> I found a couple links related to the libvirt error [1] [2] but for the
> most part they seem to say that the message can be ignored or the
> discussions are not actually centered around the libvirt error. Libvirt is
> in fact running on the compute host and the systemvm's are running on top
> of it. They both show a "state" of "running" and the "agent status" of "up"
> in the UI. Does anybody know why the SSVM and the compute host would
> constantly disconnect and reconnect? I'm happy to provide any additional
> information. Any help is appreciated.
>
> [1] (https://github.com/apache/cloudstack/issues/4700)
> [2] (https://github.com/apache/cloudstack/issues/3509)
>
> --
> Thanks,
> Joshua Schaeffer
>
>

-- 

Andrija Panić

Reply via email to