OK upon further investigation i have found some trace of a root cause. Oslo.messaging always uses a timeout of 1 second when polling queues and connections. This appears to be too small when using ssl and frequently results in SSLError/timeout which cause all threads to fail and reconnect and fail again repeatedly thus resulting in the number of connections rising fast and rpc not working, hence why compute and conductor are not able to communicate. I've played around with alternative timeout values and I get much better results even with a value of 2s instead of 1s. I'll propose an initial workaround patch shortly so we can get out of this bind for now but I think we'll ultimately need a more intelligent solution than what oslo.messaging support in this version.
** Changed in: python-oslo.messaging (Ubuntu) Status: Confirmed => In Progress ** Changed in: python-oslo.messaging (Ubuntu) Assignee: (unassigned) => Edward Hope-Morley (hopem) ** Changed in: python-oslo.messaging (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to python-oslo.messaging in Ubuntu. https://bugs.launchpad.net/bugs/1472712 Title: Using SSL with rabbitmq prevents communication between nova-compute and conductor after latest nova updates To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1472712/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs