** Description changed: [Impact] If there are many exchanges and queues, after failing over, rabbitmq- server shows us error that exchanges are cannot be found. Affected Bionic (Queens) Not affected Focal [Test Case] 1. deploy simple rabbitmq cluster - https://pastebin.ubuntu.com/p/MR76VbMwY5/ 2. juju ssh neutron-gateway/0 - for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done 3. it would be better if we can add more exchanges, queues, bindings - rabbitmq-plugins enable rabbitmq_management - rabbitmqctl add_user test password - rabbitmqctl set_user_tags test administrator - rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*" - https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) [1] - for i in {1..2000}; do ./create.sh test_$i; done 4. restart rabbitmq-server service or shutdown machine and turn on several times. 5. you can see the exchange not found error - [1] create.sh (pasting here because pastebins don't last forever) #!/bin/bash rabbitmqadmin declare exchange -V openstack name=$1 type=direct -u test -p password rabbitmqadmin declare queue -V openstack name=$1 durable=false -u test -p password 'arguments={"x-expires":1800000}' rabbitmqadmin -V openstack declare binding source=$1 destination_type="queue" destination=$1 routing_key="" -u test -p password - [Where problems could occur] 1. every service which uses oslo.messaging need to be restarted. 2. Message transferring could be an issue [Others] + + Possible Workaround + + 1. for exchange not found issue, + - create exchange, queue, binding for problematic name in log + - then restart rabbitmq-server one by one + + 2. for queue crashed and failed to restart + - delete specific queue in log + // original description Input: - OpenStack Pike cluster with ~500 nodes - DVR enabled in neutron - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages?
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1789177 Title: RabbitMQ fails to synchronize exchanges under high load To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1789177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs