GitHub user adietrich-ussignal closed a discussion: Agent and management server communication broken after KVM host reboot
<!-- Verify first that your issue/request is not already reported on GitHub. Also test if the latest release and main branch are affected too. Always add information AFTER of these HTML comments, but no need to delete the comments. --> ##### ISSUE TYPE <!-- Pick one below and delete the rest --> * Bug Report ##### COMPONENT NAME <!-- Categorize the issue, e.g. API, VR, VPN, UI, etc. --> ~~~ Agent ~~~ ##### CLOUDSTACK VERSION <!-- New line separated list of affected versions, commit ID for issues on main branch. --> ~~~ 4.19.1.1 ~~~ ##### CONFIGURATION <!-- Information about the configuration if relevant, e.g. basic network, advanced networking, etc. N/A otherwise --> Advanced Networking KVM Consolidated Cluster: - 2 KVM Hosts: Management Server, Agent, & Database - 1 KVM Host: Agent ##### OS / ENVIRONMENT <!-- Information about the environment if relevant, N/A otherwise --> Ubuntu 22.04 ##### SUMMARY <!-- Explain the problem/feature briefly --> When performing a host reboot after updating general packages (e.g., kernel updates), the agent fails to return to an "Up" state. The error observed when attempting to disable maintnenance mode: <img width="285" alt="Screenshot 2024-08-13 at 2 19 52 PM" src="https://github.com/user-attachments/assets/d02f7e50-729a-4d26-b739-0c47258fe877"> The agent logs: Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from command "lscpu". Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from lscpu. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.lang.NullPointerException Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Float.parseFloat(Float.java:455) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromCommandLscpu(KVMHostInfo.java:134) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:109) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.<init>(KVMHostInfo.java:66) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent.sendStartup(Agent.java:460) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:83) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:29) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Thread.run(Thread.java:829) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency]. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency] Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.io.FileNotFoundException: /sys/devices/system/cpu/cpu0/cpufreq/base_frequency (No such file or directory) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.open0(Native Method) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.open(FileInputStream.java:219) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileReader.<init>(FileReader.java:60) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromFile(KVMHostInfo.java:145) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:114) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.<init>(KVMHostInfo.java:66) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent.sendStartup(Agent.java:460) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:83) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:29) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Thread.run(Thread.java:829) Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from "host capabilities" Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Retrieved value [2100000000] from "host capabilities". This corresponds to a CPU speed of [2100] MHz. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Host uses control group [cgroup2fs]. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Calculating the max shares of the host. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) The max shares of the host is [268800]. Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: root : PWD=/ ; USER=root ; COMMAND=/usr/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0) Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session closed for user root Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script] (Agent-Handler-1:) (logid:) Execution of process [7107] for command [/bin/bash -c dpkg -l nbdkit ] failed. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script] (Agent-Handler-1:) (logid:) Process [7107] for command [/bin/bash -c dpkg -l nbdkit ] encountered the error: [1]. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to create storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 (Filesystem) in libvirt Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing defined storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9, using it. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 from libvirt Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Process agent startup answer, agent id = 0 Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Set agent id 0 Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Startup Response Received: agent id = 0 Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Attempting to create storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 (NetworkFilesystem) in libvirt Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Found existing defined storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6, using it. Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Trying to fetch storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 from libvirt The management server log: Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: com.cloud.utils.exception.CloudRuntimeException: Could not restart agent on Host {"id":17,"name":"dtw-cm-kvm01-03","type":"Routing","uuid":"2549c995-78ca-411f-91f8-7a4863fe3d15"} due to: Failed to restart cloudstack-agent.service: Interactive authentication required. Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: See system logs and 'systemctl status cloudstack-agent.service' for details. Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.connectAndRestartAgentOnHost(ResourceManagerImpl.java:2958) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.handleAgentIfNotConnected(ResourceManagerImpl.java:2923) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2893) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2977) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1294) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.lang.reflect.Method.invoke(Method.java:566) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.sun.proxy.$Proxy200.cancelMaintenance(Unknown Source) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:96) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:172) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:654) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:602) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.lang.Thread.run(Thread.java:829) Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-12:ctx-d6de7f78 job-720) (logid:cb386b6d) Remove job-720 from job monitoring ##### STEPS TO REPRODUCE <!-- For bugs, show exactly how to reproduce the problem, using a minimal test-case. Use Screenshots if accurate. For new features, show how the feature would be used. --> <!-- Paste example playbooks or commands between quotes below --> ~~~ 1. Place KVM host into maintenance mode 2. Perform package updates on the KVM host 3. Reboot the KVM host ~~~ <!-- You can also paste gist.github.com links for larger files --> ##### EXPECTED RESULTS <!-- What did you expect to happen when running the steps above? --> ~~~ The agent status changes to "Up" after the reboot has been completed and the agent is started. ~~~ ##### ACTUAL RESULTS <!-- What actually happened? --> <!-- Paste verbatim command output between quotes below --> ~~~ The agent status is "Alert". ~~~ GitHub link: https://github.com/apache/cloudstack/discussions/9907 ---- This is an automatically sent email for users@cloudstack.apache.org. To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org