Dan Yasny has posted comments on this change.

Change subject: Cisco VM-FEX support vdsm hooks
......................................................................


Patch Set 13: (2 inline comments)

....................................................
File vdsm_hooks/vmfex/before_vm_migrate_destination.py
Line 71: def createDirectPool(conn):
Line 72:     with open('/var/run/vdsm/hook-vmfex.lock', 'w') as f:
Line 73:         fcntl.flock(f.fileno(), fcntl.LOCK_EX)
Line 74:         try:
Line 75:             #singleton - doublecheck after taking lock
There seemed to be a race between two threads where:
-thread A would run removeDirectPool() and release lock
- thread B would come in, check for direct-pool existence and since it doesn't 
exist, try to create it
- thread A comes back, takes lock on createDirectPool, creates the network, 
releases lock
- thread B come into createDirectPool() and crashes on libvirtError - network 
already exists.

When bulk starting/migrating VMs this was causing a small percentage of VMs not 
to start/migrate. The solution, as suggested by Livnat, was to 
1 check for direct-pool existence
2 if not exist, take lock
3 check again for existence
4 if not exist - create

With this solution, thread B would fail on second check and will skip creating 
direct-pool, just starting the VM. Apparently this is a classic CS solution 
called "singleton" (according to Livnat).

Indeed, with this implementation, I successfully migrate/start 50 VMs with 
VMFEX NICs, and with the direct-pool network in various different shapes to 
begin with.


I also had a long chat with apuimedo regarding this, and he confirmed this is a 
valid solution. If you have a better one, please let me know (I'd rather 
discuss this live, and asap)
Line 76:             #to avoid a race between two threads
Line 77:             if 'direct-pool' in conn.listNetworks():
Line 78:                 dpool = conn.networkLookupByName('direct-pool')
Line 79:                 #destroy and undefine direct-pool


Line 132:         return
Line 133:     #Now we know for sure that the pool exists...
Line 134:     #are there VMs running, and are they using 802.1Qbh?
Line 135:     #if yes, can't touch the existing pool
Line 136:     if not qbhInUse(conn):
going to fly out on libvirt error again, removeNetwork won't work if the 
network is in use, and create will fail if the network already exists. Ugly, 
but seems safe to me.
Line 137:     #no VMs use the existing pool, it can be checked and updated
Line 138:         #is the pool the same as list of available dNICs?
Line 139:         if not compareDPoolToUsableNics(conn):
Line 140:         #not the same, something changed, recreating


--
To view, visit http://gerrit.ovirt.org/7547
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45a7fa46919bb39a648dff190c40618395990e91
Gerrit-PatchSet: 13
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Dan Yasny <[email protected]>
Gerrit-Reviewer: Antoni Segura Puimedon <[email protected]>
Gerrit-Reviewer: Dan Kenigsberg <[email protected]>
Gerrit-Reviewer: Dan Yasny <[email protected]>
Gerrit-Reviewer: Federico Simoncelli <[email protected]>
Gerrit-Reviewer: Igor Lvovsky <[email protected]>
Gerrit-Reviewer: oVirt Jenkins CI Server
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to