I am currently dealing with an unexplained monitoring question in OpenNebula 4.6 on my development cloud.
I frequently see OpenNebula return that the status of a ONe host is "ON" even in the case of a system misconfiguration where, given the credentials, it is impossible for opennebula to even ssh into the node as oneadmin. I've fixed all those instances, restarted OpenNebula, but opennebula still reports a number of VM's in state "running" even though the node they are running on was rebooted three days ago and is running no virtual machines whatsoever. I think I could be dealing with database corruption of some type (generated on the one4.4->one4.6 update), or there could be some problem with the remote scripts on the nodes. I saw, and I think I fixed, the problems with the database corruption (namely one of the hosts and one of the datastores got knocked out of the database for reasons unknown, and I re-inserted them). But in any case there is some error handling that is not working in the monitoring and something is exiting with status 0 that shouldn't be. ideas? Has anyone else seen something like this? Steve Timm ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 t...@fnal.gov http://home.fnal.gov/~timm/ Fermilab Scientific Computing Division, Scientific Computing Services Quad. Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing _______________________________________________ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org