GitHub user adietrich-ussignal closed a discussion: Agent and management server 
communication broken after KVM host reboot

<!--
Verify first that your issue/request is not already reported on GitHub.
Also test if the latest release and main branch are affected too.
Always add information AFTER of these HTML comments, but no need to delete the 
comments.
-->

##### ISSUE TYPE
<!-- Pick one below and delete the rest -->
 * Bug Report

##### COMPONENT NAME
<!--
Categorize the issue, e.g. API, VR, VPN, UI, etc.
-->
~~~
Agent
~~~

##### CLOUDSTACK VERSION
<!--
New line separated list of affected versions, commit ID for issues on main 
branch.
-->

~~~
4.19.1.1
~~~

##### CONFIGURATION
<!--
Information about the configuration if relevant, e.g. basic network, advanced 
networking, etc.  N/A otherwise
-->
Advanced Networking 
KVM 
Consolidated Cluster: 

-  2 KVM Hosts: Management Server, Agent, & Database 
- 1 KVM Host: Agent

##### OS / ENVIRONMENT
<!--
Information about the environment if relevant, N/A otherwise
-->
Ubuntu 22.04 

##### SUMMARY
<!-- Explain the problem/feature briefly -->
When performing a host reboot after updating general packages (e.g., kernel 
updates), the agent fails to return to an "Up" state. 

The error observed when attempting to disable maintnenance mode: 
<img width="285" alt="Screenshot 2024-08-13 at 2 19 52 PM" 
src="https://github.com/user-attachments/assets/d02f7e50-729a-4d26-b739-0c47258fe877";>

The agent logs: 
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  [utils.linux.KVMHostInfo] 
(Agent-Handler-1:) (logid:) Fetching CPU speed from command "lscpu".
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] 
(Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from lscpu.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.lang.NullPointerException
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.lang.Float.parseFloat(Float.java:455)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromCommandLscpu(KVMHostInfo.java:134)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:109)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.<init>(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  [utils.linux.KVMHostInfo] 
(Agent-Handler-1:) (logid:) Fetching CPU speed from file 
[/sys/devices/system/cpu/cpu0/cpufreq/base_frequency].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] 
(Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from file 
[/sys/devices/system/cpu/cpu0/cpufreq/base_frequency]
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.io.FileNotFoundException: 
/sys/devices/system/cpu/cpu0/cpufreq/base_frequency (No such file or directory)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.io.FileInputStream.open0(Native Method)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.io.FileInputStream.open(FileInputStream.java:219)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.io.FileReader.<init>(FileReader.java:60)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromFile(KVMHostInfo.java:145)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:114)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
org.apache.cloudstack.utils.linux.KVMHostInfo.<init>(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]:         at 
java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  [utils.linux.KVMHostInfo] 
(Agent-Handler-1:) (logid:) Fetching CPU speed from "host capabilities"
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  [utils.linux.KVMHostInfo] 
(Agent-Handler-1:) (logid:) Retrieved value [2100000000] from "host 
capabilities". This corresponds to a CPU speed of [2100] MHz.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Host uses 
control group [cgroup2fs].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Calculating 
the max shares of the host.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) The max 
shares of the host is [268800].
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]:     root : PWD=/ ; USER=root ; 
COMMAND=/usr/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session 
opened for user root(uid=0) by (uid=0)
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session 
closed for user root
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN  [utils.script.Script] 
(Agent-Handler-1:) (logid:) Execution of process [7107] for command [/bin/bash 
-c dpkg -l nbdkit ] failed.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN  [utils.script.Script] 
(Agent-Handler-1:) (logid:) Process [7107] for command [/bin/bash -c dpkg -l 
nbdkit ] encountered the error: [1].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to 
create storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 (Filesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing 
defined storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9, using it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch 
storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 from libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  [cloud.agent.Agent] 
(Agent-Handler-2:) (logid:) Process agent startup answer, agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  [cloud.agent.Agent] 
(Agent-Handler-2:) (logid:) Set agent id 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  [cloud.agent.Agent] 
(Agent-Handler-2:) (logid:) Startup Response Received: agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) 
Attempting to create storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 
(NetworkFilesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) 
Found existing defined storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6, using 
it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO  
[kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) 
Trying to fetch storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 from libvirt

The management server log: 
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: 
com.cloud.utils.exception.CloudRuntimeException: Could not restart agent on 
Host 
{"id":17,"name":"dtw-cm-kvm01-03","type":"Routing","uuid":"2549c995-78ca-411f-91f8-7a4863fe3d15"}
 due to: Failed to restart cloudstack-agent.service: Interactive authentication 
required.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: See system logs and 'systemctl 
status cloudstack-agent.service' for details.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.cloud.resource.ResourceManagerImpl.connectAndRestartAgentOnHost(ResourceManagerImpl.java:2958)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.cloud.resource.ResourceManagerImpl.handleAgentIfNotConnected(ResourceManagerImpl.java:2923)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2893)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2977)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1294)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.sun.proxy.$Proxy200.cancelMaintenance(Unknown Source)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:96)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:172)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:654)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:602)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:         at 
java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: INFO  
[o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-12:ctx-d6de7f78 job-720) 
(logid:cb386b6d) Remove job-720 from job monitoring

##### STEPS TO REPRODUCE
<!--
For bugs, show exactly how to reproduce the problem, using a minimal test-case. 
Use Screenshots if accurate.

For new features, show how the feature would be used.
-->

<!-- Paste example playbooks or commands between quotes below -->
~~~
1. Place KVM host into maintenance mode 
2. Perform package updates on the KVM host 
3. Reboot the KVM host 
~~~

<!-- You can also paste gist.github.com links for larger files -->

##### EXPECTED RESULTS
<!-- What did you expect to happen when running the steps above? -->

~~~
The agent status changes to "Up" after the reboot has been completed and the 
agent is started.
~~~

##### ACTUAL RESULTS
<!-- What actually happened? -->

<!-- Paste verbatim command output between quotes below -->
~~~
The agent status is "Alert".
~~~


GitHub link: https://github.com/apache/cloudstack/discussions/9907

----
This is an automatically sent email for users@cloudstack.apache.org.
To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org

Reply via email to