[Openstack-operators] [monitoring][messaging][rpc] When your OpenStack app is dead

Bogdan Dobrelya Tue, 11 Aug 2015 07:25:20 -0700

There is known issue [0] in Oslo messaging and it seems resolved in
Kilo. But the UX of this one is very sad. For example, each time when
your AMQP cluster executed a single node failover and recovered running
happy, there is a chance some OpenStack apps, like Nova Compute, may
stuck in broken state and only a restat could help to heal them.


The typical log pattern for this broken state of a service is a "Timed
out waiting for reply". Hence, it may be a good idea to implement
monitoring filters based on that pattern and automatically set an alert
status for affected OpenStack services.

[0] https://bugs.launchpad.net/oslo.messaging/+bug/1338732

-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [monitoring][messaging][rpc] When your OpenStack app is dead

Reply via email to