Are you using NFS? Yea, we implmented locking because of that problem:
https://libvirt.org/locking-lockd.html echo lock_manager = \"lockd\" >> /etc/libvirt/qemu.conf -----Original Message----- From: Andrija Panic <andrija.pa...@gmail.com> Sent: Wednesday, October 30, 2019 6:55 AM To: dev <d...@cloudstack.apache.org> Cc: users <users@cloudstack.apache.org> Subject: Re: Virtual machines volume lock manager I would advise trying to reproduce. start migration, then either: - configure timeout so that it''s way too low, so that migration fails due to timeouts. - restart mgmt server in the middle of migrations This should cause migration to fail - and you can observe if you have reproduced the problem. keep in mind, that there might be some garbage left, due to not-properly handling the failed migration But from QEMU point of view - if migration fails, by all means the new VM should be destroyed... On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <http://sea.ippathways.com:32224/?dmVyPTEuMDAxJiYzM2ZmODRmOWFhMzdmZmQ1OT01REI5N0ExQV84NTE5N18yMDM4OV8xJiZjZjE2YzBlNTI0N2VmMjM9MTIzMyYmdXJsPXd3dyUyRXJha2VzaHYlMkVjb20=@gmail.com> wrote: > Hi Andrija > > > Sorry for the late reply. > > Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40 > > Im not sure if ACS job failed or libvirt job as I didnt see into logs. > Yes the vm will be in paused state during migration but after the > failed migration, the same vm was in "running" state on two different > hypervisors. > We wrote a script to find out how duplicated vm's are running and > found out that more than 5 vm's had this issue. > > > On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic > <andrija.pa...@gmail.com> > wrote: > > > I've been running KVM public cloud up to recently and have never > > seen > such > > behaviour. > > > > What versions (ACS, qemu, libvrit) are you running? > > > > How does the migration fail - ACS job - or libvirt job? > > destination VM is by default always in PAUSED state, until the > > migration > is > > finished - only then the destination VM (on the new host) will get > RUNNING, > > while previously pausing the original VM (on the old host). > > > > i,e. > > phase1 source vm RUNNING, destination vm PAUSED (RAM content being > > copied over... takes time...) > > phase2 source vm PAUSED, destination vm PAUSED (last bits of RAM > > content are migrated) > > phase3 source vm destroyed, destination VM RUNNING. > > > > Andrija > > > > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh < > http://sea.ippathways.com:32224/?dmVyPTEuMDAxJiYzM2ZmODRmOWFhMzdmZmQ1O > T01REI5N0ExQV84NTE5N18yMDM4OV8xJiZjZjE2YzBlNTI0N2VmMjM9MTIzMyYmdXJsPXd > 3dyUyRXJha2VzaHYlMkVjb20=@gmail.com> > > wrote: > > > > > Hello Users > > > > > > > > > Recently we have seen cases where when the Vm migration fails, > cloudstack > > > ends up running two instances of the same VM on different hypervisors. > > The > > > state will be "running" and not any other transition state. This > > > will > of > > > course lead to corruption of disk. Does CloudStack has any option > > > of > > volume > > > locking so that two instances of the same VM wont be running? > > > Anyone else has faced this issue and found some solution to fix it? > > > > > > We are thinking of using "virtlockd" of libvirt or implementing > > > custom > > lock > > > mechanisms. There are some pros and cons of the both the solutions > > > and > i > > > want your feedback before proceeding further. > > > > > > -- > > > Thanks and regards > > > Rakesh venkatesh > > > > > > > > > -- > > > > Andrija Panić > > > > > -- > Thanks and regards > Rakesh venkatesh > -- Andrija Panić