Hi, I'm running a ONE 4.2 pool, and had some issues with it earlier today.
I had some vm hosts lock up due to networking issues, where the vm hosts could see the rest of the world, but not be reached by the ONE server. As a result, the ONE server called a hook script: VM_HOOK = [ name = "on_crash_boot", on = "UNKNOWN", command = "/usr/bin/env onevm boot", arguments = "$ID" ] This resulted in an attempted cleanup (which appears to fail due to the ongoing network problems) followed by a restart elsewhere. However, the failed cleanup meant that I then had 2 instances of the same guest running on 2 vm hosts, which led to mac address conflicts on the network. Is this a bug in ONE's handling of cleanup failure, or is there something else I should be doing in my hook script to ensure that it is safe to call onevm boot? Any advice appreciated! (other than to take better care of the network :) ) thanks, Matthew oned.log starts as follows: Thu Jan 9 08:13:07 2014 [InM][I]: Command execution fail: 'if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm 2 vmhost3; else exit 42; fi' Thu Jan 9 08:13:07 2014 [InM][I]: Connection closed by 192.168.12.16 Thu Jan 9 08:13:07 2014 [InM][I]: ExitCode: 255 Thu Jan 9 08:13:07 2014 [ONE][E]: Error monitoring Host vmhost3 (2): - Thu Jan 9 08:13:07 2014 [ReM][D]: Req:3296 UID:0 VirtualMachineAction invoked, "boot", 14 Thu Jan 9 08:13:07 2014 [DiM][D]: Restarting VM 14 Thu Jan 9 08:13:07 2014 [ReM][D]: Req:3296 UID:0 VirtualMachineAction result SUCCESS, 14 Thu Jan 9 08:13:07 2014 [HKM][D]: Message received: EXECUTE SUCCESS 14 on_crash_boot: Thu Jan 9 08:13:08 2014 [ReM][D]: Req:3328 UID:0 VirtualMachineInfo invoked, 14 Thu Jan 9 08:13:08 2014 [ReM][D]: Req:3328 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>14</ID><UID>..." Thu Jan 9 08:13:08 2014 [ReM][D]: Req:9328 UID:0 VirtualMachineAction invoked, "delete-recreate", 14 Thu Jan 9 08:13:08 2014 [ReM][D]: Req:9328 UID:0 VirtualMachineAction result SUCCESS, 14 Thu Jan 9 08:13:08 2014 [VMM][D]: Message received: LOG I 14 Driver command for 14 cancelled The (slightly redacted) guest log (14.log) is as follows: Thu Jan 9 07:44:53 2014 [LCM][I]: New VM state is RUNNING Thu Jan 9 08:13:07 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 9 08:13:07 2014 [LCM][I]: New VM state is BOOT_UNKNOWN Thu Jan 9 08:13:07 2014 [HKM][I]: Success executing Hook: on_crash_boot: . Thu Jan 9 08:13:07 2014 [VMM][I]: Generating deployment file: /var/lib/one/vms/14/deployment.4917 Thu Jan 9 08:13:08 2014 [LCM][I]: New VM state is CLEANUP. Thu Jan 9 08:13:08 2014 [VMM][I]: Driver command for 14 cancelled Thu Jan 9 08:18:52 2014 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/cancel 'one-14' 'vmhost3' 14 vmhost3 Thu Jan 9 08:18:52 2014 [VMM][I]: Connection closed by 192.168.12.16 Thu Jan 9 08:18:52 2014 [VMM][I]: ExitSSHCode: 255 Thu Jan 9 08:18:52 2014 [VMM][E]: Error connecting to vmhost3 Thu Jan 9 08:18:52 2014 [VMM][I]: Failed to execute virtualization driver operation: cancel. Thu Jan 9 08:18:52 2014 [VMM][I]: Command execution fail: /var/tmp/one/vnm/dummy/clean <...snip...> Thu Jan 9 08:18:52 2014 [VMM][I]: Connection closed by 192.168.12.16 Thu Jan 9 08:18:52 2014 [VMM][I]: ExitSSHCode: 255 Thu Jan 9 08:18:52 2014 [VMM][E]: Error connecting to vmhost3 Thu Jan 9 08:18:52 2014 [VMM][I]: Failed to execute network driver operation: clean. Thu Jan 9 08:19:01 2014 [VMM][I]: Successfully execute transfer manager driver operation: tm_delete. Thu Jan 9 08:19:02 2014 [VMM][I]: Successfully execute transfer manager driver operation: tm_delete. Thu Jan 9 08:19:02 2014 [VMM][I]: Host successfully cleaned. Thu Jan 9 08:19:03 2014 [DiM][I]: New VM state is PENDING Thu Jan 9 08:20:54 2014 [DiM][I]: New VM state is ACTIVE. Thu Jan 9 08:20:54 2014 [LCM][I]: New VM state is PROLOG. Thu Jan 9 08:20:54 2014 [VM][I]: Virtual Machine has no context Thu Jan 9 08:20:54 2014 [LCM][I]: New VM state is BOOT Thu Jan 9 08:20:54 2014 [VMM][I]: Generating deployment file: /var/lib/one/vms/14/deployment.4918 Thu Jan 9 08:20:56 2014 [VMM][I]: ExitCode: 0 Thu Jan 9 08:20:56 2014 [VMM][I]: Successfully execute network driver operation: pre. Thu Jan 9 08:20:56 2014 [VMM][I]: ExitCode: 0 Thu Jan 9 08:20:56 2014 [VMM][I]: Successfully execute virtualization driver operation: deploy. Thu Jan 9 08:20:56 2014 [VMM][I]: ExitCode: 0 Thu Jan 9 08:20:56 2014 [VMM][I]: Successfully execute network driver operation: post. Thu Jan 9 08:20:56 2014 [LCM][I]: New VM state is RUNNING -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org