https://review.openstack.org/#/c/28424/ landed in Havana. Is this still valid? I know there were some fixes for Ceph shared storage and resize made in Kilo which we also backported to stable/juno. I'm not sure if those would also resolve issues for NFS, but I'd think they are related, so marking this invalid at this point. Please re-open if this is still an issue.
** Changed in: nova Status: Confirmed => Invalid ** Tags added: nfs resize -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1218372 Title: [libvirt] resize fails when using NFS shared storage Status in OpenStack Compute (nova): Invalid Bug description: With two hosts installed using devstack with a multi-node configuration and the directory /opt/stack/data/nova/instances/ shared using NFS. When performing a resize I get the following error (Complete traceback in http://paste.openstack.org/show/45368/): "qemu-img: Could not open '/opt/stack/data/nova/instances/7dbeb7f2-39e2-4f1d-8228-0b7a84d27745/disk': Permission denied\n" This problem was introduced with patch https://review.openstack.org/28424 which modified the behaviour of migrate/resize when using shared storage. Before that, the disk was moved to the new host using ssh even if using shared storage (which could cause some data loss when an error happened) but now, if we're using shared storage it won't send the disk to the other host but only assume that it will be accessible from there. In the end both are using the same storage, why should this be a problem? After doing some research on how NFS handles its shares on the client side, I realized that NFS client keeps a file cache with the file name and the inodes which, if no process asks for it before, will be refreshed on intervals of from 3 to 60 seconds (See nfs options ac[dir|reg][min|max] in nfs' manpage). So, if a process tries to access a file which has been renamed on the remote server it will be accessing the old version because the name is still pointing to the old inode (cache won't be updated when accessing a file but only when asking for the file attributes, e.g. ls -lh) In the resize case, the origin compute node renamed the instance directory to "$INSTANCE_DIR/<instance_uuid>_resize" (owned by root after qemu stops) and created the new instance disk from it under the new directory "$INSTANCE_DIR/<instance_uuid>". From the destination host, even thought we were trying to access the new disk file from "$INSTANCE_DIR/<instance_uuid>/disk" we were still holding the old inode for that path which pointed to "$INSTANCE_DIR/<instance_uuid>_resize/disk" (owned by root, inaccessible, the wrong image, etc, etc). If the NFS share is mounted with the option "noac" which (from manpage) "forces application writes to become synchronous so that local changes to a file become visible on the server immediately". This prevents the files to be out of sync, but it comes with the drawback of issuing a network call for every file operation which may cause performance issues. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1218372/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp