Public bug reported: Occasionally VM live-migrations fail in post-migration because the request to activate the port binding on the new host fails with a 500 Internal Server Error. It appears that nova-compute might try two requests in parallel. One of them succeeds, the other one returns the error.
Neutron version: yoga, 20.1.0 How to reproduce: - create a port for a compute instance, with a binding to host host1 - create an additional port binding for host2, i.e. POST /v2.0/ports/{port_id}/bindings - that will create the new binding with status=INACTIVE - activate the port binding with 2 requests in parallel (2 times PUT /v2.0/ports/{port_id}/bindings/host2/activate) Actual result: - one PUT request returns 200 - other PUT request returns 500 In neutron-server log the failed request logs an exception: "sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.NoneType' is not mapped." See https://paste.opendev.org/show/bFICeriQTlkmVwYQ5nzo/ Expected result: - one PUT request returns 200 - other PUT request returns 409 (port binding already active) Background: Nova live-migrations may trigger such concurrent activate requests. In preparation of the live-migration nova will create a new port binding for the destination host. When the migration completes it will activate that binding. At least in our setup that activation may be triggered from two places: (a) when the lifecycle event about completed migration is handled and (b) when the migration job monitor actively detects that the migration completed. If the 2nd one fails, the post-live-migration breaks and the whole migration goes into error state and may not finish all its work. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1986003 Title: Exception in concurrent port binding activation Status in neutron: New Bug description: Occasionally VM live-migrations fail in post-migration because the request to activate the port binding on the new host fails with a 500 Internal Server Error. It appears that nova-compute might try two requests in parallel. One of them succeeds, the other one returns the error. Neutron version: yoga, 20.1.0 How to reproduce: - create a port for a compute instance, with a binding to host host1 - create an additional port binding for host2, i.e. POST /v2.0/ports/{port_id}/bindings - that will create the new binding with status=INACTIVE - activate the port binding with 2 requests in parallel (2 times PUT /v2.0/ports/{port_id}/bindings/host2/activate) Actual result: - one PUT request returns 200 - other PUT request returns 500 In neutron-server log the failed request logs an exception: "sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.NoneType' is not mapped." See https://paste.opendev.org/show/bFICeriQTlkmVwYQ5nzo/ Expected result: - one PUT request returns 200 - other PUT request returns 409 (port binding already active) Background: Nova live-migrations may trigger such concurrent activate requests. In preparation of the live-migration nova will create a new port binding for the destination host. When the migration completes it will activate that binding. At least in our setup that activation may be triggered from two places: (a) when the lifecycle event about completed migration is handled and (b) when the migration job monitor actively detects that the migration completed. If the 2nd one fails, the post-live-migration breaks and the whole migration goes into error state and may not finish all its work. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1986003/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp