Re: [Xen-devel] [Xen-users] Xen shutdown fails to release DRBD device
On Fri, Aug 24, 2018 at 06:22:32PM +0200, Valentin Vidic wrote: > Managed to reproduce this and xen_blkif_disconnect is always returning 0 > like you expected. So this is some other issue, and from what I can tell > blkdev_put of the underlying drbd device gets called some time after > xenbus_switch_state(dev, XenbusStateClosed). Any idea how to make sure > it happens in the opposite order: blkdev_put before XenbusStateClosed? Moving the XenbusStateClosed call to xen_blkif_free seems to help, let me know if you think there is better solution for this? static void xen_blkif_free(struct xen_blkif *blkif) { WARN_ON(xen_blkif_disconnect(blkif)); xen_vbd_free(>vbd); xenbus_switch_state(blkif->be->dev, XenbusStateClosed); kfree(blkif->be->mode); kfree(blkif->be); /* Make sure everything is drained before shutting down */ kmem_cache_free(xen_blkif_cachep, blkif); } -- Valentin ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Xen-users] Xen shutdown fails to release DRBD device
On Wed, Aug 22, 2018 at 06:23:01PM +0200, Valentin Vidic wrote: > On Wed, Aug 22, 2018 at 05:51:54PM +0200, Roger Pau Monné wrote: > > Can you add some debug prints to check if xen_blkif_disconnect is > > indeed returning EBUSY (or some error) and that's preventing the > > device from closing correctly? > > These are production nodes, but I'll try that on some test machines... Managed to reproduce this and xen_blkif_disconnect is always returning 0 like you expected. So this is some other issue, and from what I can tell blkdev_put of the underlying drbd device gets called some time after xenbus_switch_state(dev, XenbusStateClosed). Any idea how to make sure it happens in the opposite order: blkdev_put before XenbusStateClosed? -- Valentin ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Xen-users] Xen shutdown fails to release DRBD device
On Wed, Aug 22, 2018 at 05:39:28PM +0200, Valentin Vidic wrote: > On Wed, Aug 22, 2018 at 03:55:46PM +0200, Valentin Vidic wrote: > > DRBD end for this seems rather simple, it only checks if the > > device->open_cnt is zero. So it would seem like drbd_release > > was not called yet when the block-drbd script is run? > > > > > > static enum drbd_state_rv > > is_valid_state(struct drbd_device *device, union drbd_state ns) > > { > > ... > > else if (ns.role == R_SECONDARY && device->open_cnt) > > rv = SS_DEVICE_IN_USE; > > ... > > } > > > > static void drbd_release(struct gendisk *gd, fmode_t mode) > > { > > struct drbd_device *device = gd->private_data; > > mutex_lock(_main_mutex); > > device->open_cnt--; > > mutex_unlock(_main_mutex); > > } > > On the Xen side it seems that XenbusStateClosed event is sent > to xenbus to run the block-drbd script. However the call to > xen_blkif_disconnect in the line before that can fail with -EBUSY > if there is still some in-flight IO for the device. Could it be > that a lot of IO during shutdown is holding the DRBD device open > while the block-drbd script has already started running? > > /* > * Callback received when the frontend's state changes. > */ > static void frontend_changed(struct xenbus_device *dev, > enum xenbus_state frontend_state) > { > ... > case XenbusStateClosed: > xen_blkif_disconnect(be->blkif); > xenbus_switch_state(dev, XenbusStateClosed); > if (xenbus_dev_is_online(dev)) > break; > /* fall through if not online */ > ... > } > > Maybe the XenbusStateClosed event should only be send when the > device is closed in xen_blkif_free or some other place? Can you add some debug prints to check if xen_blkif_disconnect is indeed returning EBUSY (or some error) and that's preventing the device from closing correctly? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel