Public bug reported:

Occasionally VM live-migrations fail in post-migration because the request to 
activate the port binding on the new host fails with a 500 Internal Server 
Error.
It appears that nova-compute might try two requests in parallel. One of them 
succeeds, the other one returns the error.

Neutron version: yoga, 20.1.0

How to reproduce:

- create a port for a compute instance, with a binding to host host1
- create an additional port binding for host2, i.e. POST 
/v2.0/ports/{port_id}/bindings
- that will create the new binding with status=INACTIVE
- activate the port binding with 2 requests in parallel (2 times PUT 
/v2.0/ports/{port_id}/bindings/host2/activate)

Actual result:

- one PUT request returns 200
- other PUT request returns 500

In neutron-server log the failed request logs an exception: 
"sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.NoneType' is not 
mapped."
See https://paste.opendev.org/show/bFICeriQTlkmVwYQ5nzo/

Expected result:

- one PUT request returns 200
- other PUT request returns 409 (port binding already active)

Background:

Nova live-migrations may trigger such concurrent activate requests.
In preparation of the live-migration nova will create a new port binding for 
the destination host. When the migration completes it will activate that 
binding. At least in our setup that activation may be triggered from two 
places: (a) when the lifecycle event about completed migration is handled and 
(b) when the migration job monitor actively detects that the migration 
completed. If the 2nd one fails, the post-live-migration breaks and the whole 
migration goes into error state and may not finish all its work.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1986003

Title:
  Exception in concurrent port binding activation

Status in neutron:
  New

Bug description:
  Occasionally VM live-migrations fail in post-migration because the request to 
activate the port binding on the new host fails with a 500 Internal Server 
Error.
  It appears that nova-compute might try two requests in parallel. One of them 
succeeds, the other one returns the error.

  Neutron version: yoga, 20.1.0

  How to reproduce:

  - create a port for a compute instance, with a binding to host host1
  - create an additional port binding for host2, i.e. POST 
/v2.0/ports/{port_id}/bindings
  - that will create the new binding with status=INACTIVE
  - activate the port binding with 2 requests in parallel (2 times PUT 
/v2.0/ports/{port_id}/bindings/host2/activate)

  Actual result:

  - one PUT request returns 200
  - other PUT request returns 500

  In neutron-server log the failed request logs an exception: 
"sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.NoneType' is not 
mapped."
  See https://paste.opendev.org/show/bFICeriQTlkmVwYQ5nzo/

  Expected result:

  - one PUT request returns 200
  - other PUT request returns 409 (port binding already active)

  Background:

  Nova live-migrations may trigger such concurrent activate requests.
  In preparation of the live-migration nova will create a new port binding for 
the destination host. When the migration completes it will activate that 
binding. At least in our setup that activation may be triggered from two 
places: (a) when the lifecycle event about completed migration is handled and 
(b) when the migration job monitor actively detects that the migration 
completed. If the 2nd one fails, the post-live-migration breaks and the whole 
migration goes into error state and may not finish all its work.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1986003/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to