Are you using NFS?

Yea, we implmented locking because of that problem:

https://libvirt.org/locking-lockd.html

echo lock_manager = \"lockd\" >> /etc/libvirt/qemu.conf

-----Original Message-----
From: Andrija Panic <andrija.pa...@gmail.com> 
Sent: Wednesday, October 30, 2019 6:55 AM
To: dev <d...@cloudstack.apache.org>
Cc: users <users@cloudstack.apache.org>
Subject: Re: Virtual machines volume lock manager

I would advise trying to reproduce.

start migration, then either:
- configure timeout so that it''s way too low, so that migration fails due to 
timeouts.
- restart mgmt server in the middle of migrations This should cause migration 
to fail - and you can observe if you have reproduced the problem.
keep in mind, that there might be some garbage left, due to not-properly 
handling the failed migration But from QEMU point of view - if migration fails, 
by all means the new VM should be destroyed...



On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh 
<http://sea.ippathways.com:32224/?dmVyPTEuMDAxJiYzM2ZmODRmOWFhMzdmZmQ1OT01REI5N0ExQV84NTE5N18yMDM4OV8xJiZjZjE2YzBlNTI0N2VmMjM9MTIzMyYmdXJsPXd3dyUyRXJha2VzaHYlMkVjb20=@gmail.com>
wrote:

> Hi Andrija
>
>
> Sorry for the late reply.
>
> Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
>
> Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> Yes the vm will be in paused state during migration but after the 
> failed migration, the same vm was in "running" state on two different 
> hypervisors.
> We wrote a script to find out how duplicated vm's are running and 
> found out that more than 5 vm's had this issue.
>
>
> On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic 
> <andrija.pa...@gmail.com>
> wrote:
>
> > I've been running KVM public cloud up to recently and have never 
> > seen
> such
> > behaviour.
> >
> > What versions (ACS, qemu, libvrit) are you running?
> >
> > How does the migration fail - ACS job - or libvirt job?
> > destination VM is by default always in PAUSED state, until the 
> > migration
> is
> > finished - only then the destination VM (on the new host) will get
> RUNNING,
> > while previously pausing the original VM (on the old host).
> >
> > i,e.
> > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > copied over... takes time...)
> > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > content are migrated)
> > phase3      source vm destroyed, destination VM RUNNING.
> >
> > Andrija
> >
> > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> http://sea.ippathways.com:32224/?dmVyPTEuMDAxJiYzM2ZmODRmOWFhMzdmZmQ1O
> T01REI5N0ExQV84NTE5N18yMDM4OV8xJiZjZjE2YzBlNTI0N2VmMjM9MTIzMyYmdXJsPXd
> 3dyUyRXJha2VzaHYlMkVjb20=@gmail.com>
> > wrote:
> >
> > > Hello Users
> > >
> > >
> > > Recently we have seen cases where when the Vm migration fails,
> cloudstack
> > > ends up running two instances of the same VM on different hypervisors.
> > The
> > > state will be "running" and not any other transition state. This 
> > > will
> of
> > > course lead to corruption of disk. Does CloudStack has any option 
> > > of
> > volume
> > > locking so that two instances of the same VM wont be running?
> > > Anyone else has faced this issue and found some solution to fix it?
> > >
> > > We are thinking of using "virtlockd" of libvirt or implementing 
> > > custom
> > lock
> > > mechanisms. There are some pros and cons of the both the solutions 
> > > and
> i
> > > want your feedback before proceeding further.
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
> Thanks and regards
> Rakesh venkatesh
>


-- 

Andrija Panić

Reply via email to