On 2016-02-14 12:16, Arik Hadas wrote:

----- Original Message -----

----- Original Message -----
On 11 Feb 2016, at 17:02, Johannes Tiefenbacher <j...@linbit.com> wrote:

Hi,
finally I am posting something to this list :) I read it for quite some
time now and I am an ovirt user since 3.0.
Hi,
welcome:)


I updated an engine installation from 3.2 to 3.6 (stepwise of course, and
yes I know that's pretty outdated ;-). Then I updated the associated
Centos6 hosts vdsm as well, from 3.10.x to 3.16.30. I also set my cluster
comp level to 3.5(3.6 comp level is only possible with El7 hosts if I
understood correctly).

After my first failover test a VM could not be restarted, altough the
host
where it was running could correctly be fenced.

The reason according to engine's log was this:

VM xxxxxxxx is down with error. Exit message: internal error process
exited
while connecting to monitor: qemu-kvm: -device
virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4:
Duplicate ID 'virtio-serial0' for device


I then recognized that I am not able to run this VM on any host. Ich
checked the virtual hardware in the engine database and could confirm
that
ALL my VMs had this problem: 2 devices with alias='virtio-serial0’
it may very well be a bug, but it would be quite difficult to say unless it
is reproducible. It may be broken from earlier releases
Arik/Shmuel, maybe it rings a bell?
In 3.6 we changed virtio-serial to be a managed device.
The script named 03_06_0310_change_virtio_serial_to_managed_device.sql
changes unmanaged virtio-serial devices (that were all unmanaged before) to
be managed.
A potential flow that will cause this duplication I can think of is:
1. Have a running VM in a pre-3.6 engine - it has unmanaged virtio-serial
2. Upgrade to 3.6 while the VM is running - the unmanaged virtio-serial
becomes managed
3. Do something that will change the hash of the devices
=> the engine will add an additional unmanaged virtio-serial device

Why didn't it happen before? because the handling of unmanaged devices was:
1. Upon change in the VM devices (their hash), ask for all the devices
(full-list)
2. Remove all previous unmanaged devices
3. Add every device that does not exist in the database
When we add an unmanaged device we generate a new ID (!) - therefore we had
to remove all the previous unmanaged devices before adding the new ones.
If the previous unmanaged virtio-serial became managed, it is not removed and
we will end up having two virtio-serial devices.

@Johannes - is it true that the VM was running before the engine got updated
to 3.6 and wasn't powered-off since then?
yes that's true

I managed to simulate this.
We probably need to prevent the addition of unmanaged virtio-serial in 3.6
engine but IMO we should also use the ID reported by VDSM instead of
generating a new one to eliminate similar issues in the future.
@Eli, Omer - can you recall why can't we use the ID we get from VDSM for the
unmanaged devices?
(we can continue this discussion in devel-list or in bugzilla..)

e.g.:

----
engine=# SELECT * FROM vm_device WHERE vm_device.device = 'virtio-serial'
AND vm_id = 'cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' ORDER BY vm_id;
-[ RECORD 1
]-------------+-------------------------------------------------------------
device_id                 | 2821d03c-ce88-4613-9095-e88eadcd3792
vm_id                     | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec
type                      | controller
device                    | virtio-serial
address                   |
boot_order                | 0
spec_params               | { }
is_managed                | t
is_plugged                | f
is_readonly               | f
_create_date              | 2016-01-14 08:30:43.797161+01
_update_date              | 2016-02-10 10:04:56.228724+01
alias                     | virtio-serial0
custom_properties         | { }
snapshot_id               |
logical_name              |
is_using_scsi_reservation | f
-[ RECORD 2
]-------------+-------------------------------------------------------------
device_id                 | 29e0805f-d836-451a-9ec3-9031baa995e6
vm_id                     | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec
type                      | controller
device                    | virtio-serial
address                   | {bus=0x00, domain=0x0000, type=pci,
slot=0x04,
function=0x0}
boot_order                | 0
spec_params               | { }
is_managed                | f
is_plugged                | t
is_readonly               | f
_create_date              | 2016-02-11 13:47:02.69992+01
_update_date              |
alias                     | virtio-serial0
custom_properties         |
snapshot_id               |
logical_name              |
is_using_scsi_reservation | f

----

My solution was this:

DELETE FROM vm_device WHERE vm_id='cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec'
AND vm_device.device = 'virtio-serial' AND address = '';

(just renaming one of the aliases to virtio-serial1" did not help)
I believe it is not the right solution, it is better to remove the unmanaged
device
1. For consistency
2. We changed the virtio-serial device to be managed in order to prevent a
problem with VM-pools where in some cases Windows OS detects an existing
virtio-serial device as a new device (and therefore pops-up a dialog for
searching for an appropriate driver). By having the virtio-serial device
managed we preserve its address and eliminate this problem.
And then to restart the VM of course, otherwise it will be added again the next 
time the devices change..

alright, so i just deleted the wrong one. was a 50:50 change ;-)

I have some test vms that can be rebootet and i can experiment with.

though for my production vms i will delete the unmanaged one as well.
it shouldn't hurt if there is no virtio-serial device at all, right? i am pretty sure we are not using these devices. does a vm bother if these devices are gone?

i also recently found the checkbox in the vm settings where i can activate or deactivate a virtio-serial device. this is unchecked in all my vms. just in case this was not obvious for you, and to be complete with my information.

I think I should understand the difference between managed and unmanaged devices first.... this should help i guess: http://www.ovirt.org/Features/Design/StableDeviceAddresses






Is this a known issue? Couldn't find anything so far.

Should I also post this to the developer list? I am not subscribed there
yet, wanted to check out here first.
I think it would be best to track and have it documented in bugzilla.
Please open a bug (https://bugzilla.redhat.com)

alright i'll open a bug.
and just for my future behaviour on this list: is it good practice to post stuff like that to this list first, before bothering the devel list or open a bugzilla instantly without knowing if it's actually a bug.
or should i have posted this to the devel list in the first place?

thank you all for your replies
best
Jojo










_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to