OK upon further investigation i have found some trace of a root cause.
Oslo.messaging always uses a timeout of 1 second when polling queues and
connections. This appears to be too small when using ssl and frequently
results in SSLError/timeout which cause all threads to fail and
reconnect and fail again repeatedly thus resulting in the number of
connections rising fast and rpc not working, hence why compute and
conductor are not able to communicate. I've played around with
alternative timeout values and I get much better results even with a
value of 2s instead of 1s. I'll propose an initial workaround patch
shortly so we can get out of this bind for now but I think we'll
ultimately need a more intelligent solution than what oslo.messaging
support in this version.

** Changed in: python-oslo.messaging (Ubuntu)
       Status: Confirmed => In Progress

** Changed in: python-oslo.messaging (Ubuntu)
     Assignee: (unassigned) => Edward Hope-Morley (hopem)

** Changed in: python-oslo.messaging (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to python-oslo.messaging in Ubuntu.
https://bugs.launchpad.net/bugs/1472712

Title:
  Using SSL with rabbitmq prevents communication between nova-compute
  and conductor after latest nova updates

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1472712/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to