On 12/20/2012 05:41 PM, Alexander Graf wrote:
> 
> 
> On 20.12.2012, at 09:46, Asias He <as...@redhat.com> wrote:
> 
>> On 12/20/2012 12:15 PM, Rusty Russell wrote:
>>> Alexander Graf <ag...@suse.de> writes:
>>>
>>>> When a file system is mounted on a virtio-blk disk, we then remove it
>>>> and then reattach it, the reattached disk gets the same disk name and
>>>> ids as the hot removed one.
>>>>
>>>> This leads to very nasty effects - mostly rendering the newly attached
>>>> device completely unusable.
>>>>
>>>> Trying what happens when I do the same thing with a USB device, I saw
>>>> that the sd node simply doesn't get free'd when a device gets forcefully
>>>> removed.
>>>>
>>>> Imitate the same behavior for vd devices. This way broken vd devices
>>>> simply are never free'd and newly attached ones keep working just fine.
>>>>
>>>> Signed-off-by: Alexander Graf <ag...@suse.de>
>>>
>>> I think deserves a CC:stable, no?
> 
> I wanted to make sure people can comment on it first. But yes, it should go 
> into stable when accepted.
> 
>>>
>>> I've put it in my pending queue for *next* merge window for now...
>>
>> Thanks for looking into this, Alexander!
>>
>> I also noticed this problem. The problem is that, if you hot-unplug a
>> mounted or opened disk, the disk is in opened state. Next time, when you
>> hotplug the same disk. The kernel thought it was opened already. The
>> driver will use the wrong gendisk data structure in bdev.
> 
> Exactly :)
> 
>>
>> blkdev_open
>>   blkdev_get
>>      __blkdev_get
>>         if (!bdev->bd_openers) {    <-- Here, bd_disk not got updated
>>                                         still points to old one
>>           bdev->bd_disk = disk;
>>           bdev->bd_queue = disk->queue;
>>           ...
>>
>> I tried something like this:
>>
>> @@ -854,6 +862,19 @@ static int __devinit virtblk_probe(struct
>> virtio_device *vdev)
>>                blk_queue_io_opt(q, blk_size * opt_io_size);
>>
>>        add_disk(vblk->disk);
>> +
>> +       for (i = 0; i < 1 << PART_BITS; i++) {
>> +               bdev = bdget_disk(vblk->disk, i);
>> +               if (bdev) {
>> +                       bdev->bd_disk = vblk->disk;
>> +                       bdev->bd_queue = q;
>> +                       bdput(bdev);
> 
> Does this stack? What if the device was mounted twice? Or mounted and dd'ed 
> at the same time? Then your refcount here is wrong.

This is in the probe path. And there is no extra refcount here.
bdev->bd_openers stays what it was before the remove.

> Also this means that you're free'ing a bdev even though there's still a 
> reference held to it. That means whoever holds it could later use that 
> pointer and access bogus memory.

The bdev is not freed.

> Alex
> 
>> +               }
>> +       }
>>
>>
>> 1) Before:
>> ---> hot-plug
>> [   35.730183] virtio_blk: virtblk_probe: vblk=ffff880078b0e000,
>> disk=ffff880078d54c00, q=ffff88007f88b3d8
>> [   35.735352] virtio_blk:    virtblk_ioctl: vblk=ffff880078b0e000,
>> disk=ffff880078d54c00, bdev=ffff88007c45cc00, q=ffff88007f88b3d8
>> ---> hot-unplug
>>
>> ---> hot-plug
>> [   83.570480] virtio_blk: virtblk_probe: vblk=ffff880078b0e000,
>> disk=ffff880078d55800, q=ffff88007f88bb40
>> [   83.575614] virtio_blk:   virtblk_ioctl: vblk=ffff880078b0e000,
>> disk=ffff880078d54c00, bdev=ffff88007c45cc00, q=ffff88007f88b3d8
>>
>> The disk points to old one ffff880078d54c00. The queue also points to
>> old one ffff88007f88b3d8.
>>
>> 2) After:
>> ---> hot-plug
>> [   68.035063] virtio_blk: virtblk_probe: vblk=ffff880079b20000,
>> disk=ffff88007f9ebc00, q=ffff8800784d8ed0
>> [   68.041140] virtio_blk:    virtblk_ioctl: vblk=ffff880079b20000,
>> disk=ffff88007f9ebc00, bdev=ffff88007ab2c000, q=ffff8800784d8ed0
>> ---> hot-unplug
>>
>> ---> hot-plug
>> [   86.317706] virtio_blk: virtblk_probe: vblk=ffff880079b20000,
>> disk=ffff88007f9eb000, q=ffff8800784d9638
>> [   86.322535] virtio_blk:   virtblk_ioctl: vblk=ffff880079b20000,
>> disk=ffff88007f9eb000, bdev=ffff88007ab2c000, q=ffff8800784d9638
>>
>> The disk and queue are updated correctly. The attached disk works and
>> still uses the old name.
>>
>> -- 
>> Asias


-- 
Asias
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to