Hi,

Sounds like libvirt/qemu have a problem restarting the VM. Are the config files for those daemons under /etc/libvirt set up identically on all your hosts? Did you make sure that the user that starts the kvm process is able to read en write to the files? (In my install I added the user "qemu" to the "oneadmin" group to make that work.)

You should have a log file for this particular VM in /var/log/libvirt/qemu/one-[vm-id].log. What does it say?

You say it happens occasionally, so not always? Does it always fail when you migrate tot this particular host, or also occasionally? If occasionally, you have to look for things that change from time to time. Do you have config files maintained by chef/puppet/cfengine? Do you have user accounts maintained by ldap/nis? If this host always fails, it must be bad config of this host; compare it to the other hosts that do work.

Where does the "unable to read from monitor" come from. Is it opennebula? That is kind of normal: it tries to read the status for the VM but since it is not up, it fails. But actually, that should not give you a "connection reset"...


What you can do to test your setup when it fails:

Go to the directory with the files and before you change anything, so a "virsh create deployment.X" where X is the highest number you can find in that dir. )In your example, it would be 2). (You will need to be root for this or you will not be able to use the "system" libvirt space.)

If that "just works" then you have a real strange problem. If it gives you errors, try to solve them. :)
(If you do not know "virsh", you should read up a bit on it.)

Hope this is a bit helpful.

Jhon

On 07/04/2012 12:59 PM, David wrote:

Hi, All
     I used OpenNebula3.2.1 version.
     When I execute VM migrate operation,Occasionally the VM appears a migration failure,
     following log:
Thu Jun 28 15:16:24 2012 [LCM][I]: New VM state is RUNNING
Thu Jun 28 15:17:09 2012 [LCM][I]: New VM state is SAVE_MIGRATE
Thu Jun 28 15:17:42 2012 [VMM][I]: save: Executed "virsh --connect qemu:///system [^] save one-383 /one_images/383/images/checkpoint".
Thu Jun 28 15:17:42 2012 [VMM][I]: ExitCode: 0
Thu Jun 28 15:17:42 2012 [VMM][I]: Successfully execute virtualization driver operation: save.
Thu Jun 28 15:17:43 2012 [VMM][I]: ExitCode: 0
Thu Jun 28 15:17:43 2012 [VMM][I]: Successfully execute network driver operation: clean.
Thu Jun 28 15:17:43 2012 [LCM][I]: New VM state is PROLOG_MIGRATE
Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Moving /one_images/383/images
Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Executed "ssh compute-56-5.local mkdir -p /one_images/383".
Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Executed "scp -r compute-56-4.local:/one_images/383/images compute-56-5.local:/one_images/383/images".
Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Executed "ssh compute-56-4.local rm -rf /one_images/383/images".
Thu Jun 28 15:56:03 2012 [TM][I]: ExitCode: 0
Thu Jun 28 15:56:03 2012 [LCM][I]: New VM state is BOOT
Thu Jun 28 15:56:05 2012 [VMM][I]: ExitCode: 0
Thu Jun 28 15:56:05 2012 [VMM][I]: Successfully execute network driver operation: pre.
Thu Jun 28 15:56:06 2012 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /one_images/383/images/checkpoint compute-56-5.local 383 compute-56-5.local
Thu Jun 28 15:56:06 2012 [VMM][E]: restore: Command "virsh --connect qemu:///system [^] restore /one_images/383/images/checkpoint" failed.
Thu Jun 28 15:56:06 2012 [VMM][E]: restore: error: Failed to restore domain from /one_images/383/images/checkpoint
Thu Jun 28 15:56:06 2012 [VMM][I]: error: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/one_images/383/images/disk.0,if=none,id=drive-virtio-disk0,format=raw: could not open disk image /one_images/383/images/disk.0: Permission denied
Thu Jun 28 15:56:06 2012 [VMM][E]: Could not restore from /one_images/383/images/checkpoint
Thu Jun 28 15:56:06 2012 [VMM][I]: ExitCode: 1
Thu Jun 28 15:56:06 2012 [VMM][I]: Failed to execute virtualization driver operation: restore.
Thu Jun 28 15:56:06 2012 [VMM][E]: Error restoring VM: Could not restore from /one_images/383/images/checkpoint
Thu Jun 28 15:56:06 2012 [DiM][I]: New VM state is FAILED

execute command : chmod +x *
but ,receive the following error message:
error: Unable to read from monitor: Connection reset by peer

This is what causes problems ?
Thanks! Hope after

Regards!


_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

-- 
Jhon Masschelein
Senior Systeemprogrammeur
SARA - HPCV

Science Park 140
1098 XG Amsterdam
T +31 (0)20 592 8099
F +31 (0)20 668 3167
M +31 (0)6 4748 9328
E jhon.masschel...@sara.nl
http://www.sara.nl 


_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to