> What we'd like to know is realtime stats about the number of NAT/TCP > sessions (i.e. conntrack status), CPU load, memory/swap usage, > network > and I/O bandwidth, and possibly some process metrics.
In addition to that, there's more checks that would make sense (and are difficult or impossible to implement on the HV): - filesystem mounted readonly - disk usage The readonly remount can happen if there is an issue with primary storage and will lead to an inaccessible router, no VMs on the attached networks being deployable and no obvious indication what's going on. We recently ran into this issue. Similarly for a full disk. As I mentioned already, we deployed additional monitoring services on each VR, so we could easily integrate them with our existing monitoring system. I would much prefer a fully integrated monitoring solution instead that can be queried through the CloudStack API. This would reduce dependency on the environment and allow router/network health monitoring (and possibly automatic redeploy) at the discretion of CloudStack. As far as I know, there isn't even a simple health indicator that can be queried through the API. Should I rather submit a feature request for this?