Hi all, We are deploying Kilo on Ubuntu Trusty. We run all services on 5 controller nodes, including RabbitMQ in cluster with HA queues. We configure "rabbit_hosts" on all services to point to the 5 rabbitmq nodes.
On each controller node the beam.smp process is taking ~150-250% of CPU and around 2GB of resident memory, even when no VM is running. Also note that this is all CPU time, no waiting time due to intensive IO. We can't figure out why. One of the (probably unrelated) things we found is that although the "heartbeat" option for rabbitmq in nova is marked as EXPERIMENTAL, it's enabled by default. Indeed, we found on the logs many errors like: <11>Jul 15 20:19:27 node-k5-01-10 2015-07-15 20:19:27.625 128786 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on cloud-l2-41.os.s3it.uzh.ch:5672 is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Note that the rabbitmq servers were all up&running. On the rabbitmq server, the error was something like: =ERROR REPORT==== 15-Jul-2015::13:13:29 === closing AMQP connection <0.18550.0> (10.129.16.173:55330 -> 10.129.31.229:5672): {heartbeat_timeout,running} We disabled heartbeat for nova, in section [oslo_messaging_rabbit]. We don't see these errors on the compute node anymore, but the CPU usage for RabbitMQ is still high, so it's probably unrelated. I wonder if anyone can answer to our questions: * is anyone is experiencing the same behavior? Do you have a solution? * why is heartbeat option in nova enabled, and if can be safely disabled? * is anyone experiencing similar issues with qpid? (we are not especially attached to any amqp implementation) * are the default values for timeout/backoff/retry in nova.conf sane, even in a not-so-small installation? (64 compute nodes right now for "testing", 128 soon) Thank you in advance for your help, Antonio Messina Package versions: rabbitmq-server 3.4.3-2~cloud0 python-amqp 1.4.6-0ubuntu1~cloud0 python-amqplib 1.0.2-1 python-kombu 3.0.24-0ubuntu2~cloud0 -- antonio.s.mess...@gmail.com antonio.mess...@uzh.ch +41 (0)44 635 42 22 S3IT: Service and Support for Science IT http://www.s3it.uzh.ch/ University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack