As per sean-k-mooney advice, I've added this to be an oslo.messaging bug
since it's more of an issue in there than it is in Nova.

** Also affects: oslo.messaging
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1917645

Title:
  Nova can't create instances if RabbitMQ notification cluster is down

Status in OpenStack Compute (nova):
  Confirmed
Status in oslo.messaging:
  New

Bug description:
  We use independent RabbitMQ clusters for each OpenStack project, Nova
  Cells and also for notifications. Recently, I noticed in our test
  infrastructure that if the RabbitMQ cluster for notifications has an
  outage, Nova can't create new instances. Possibly other operations
  will also hang.

  Not being able to send a notification/connect to the RabbitMQ cluster
  shouldn't stop new instances to be created. (If this is actually an
  use-case for some deployments, the operator should have the
  possibility to configure it.)

  Tested against the master branch.

  If the notification RabbitMQ is stooped, when creating an instance,
  nova-scheduler is stuck with:

  ```
  Mar 01 21:16:28 devstack nova-scheduler[18384]: DEBUG 
nova.scheduler.request_filter [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 
demo demo] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=18384) 
wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}}
  Mar 01 21:16:32 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:16:35 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:16:42 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:16:51 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:17:02 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  (...)
  ```

  Because the notification RabbitMQ cluster is down, Nova gets stuck in:

  
https://github.com/openstack/nova/blob/5b66caab870558b8a7f7b662c01587b959ad3d41/nova/scheduler/filter_scheduler.py#L85

  because oslo messaging never gives up:

  
https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L736

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1917645/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to