> What we'd like to know is realtime stats about the number of NAT/TCP
> sessions (i.e. conntrack status), CPU load, memory/swap usage,
> network
> and I/O bandwidth, and possibly some process metrics.

In addition to that, there's more checks that would make sense (and are
difficult or impossible to implement on the HV):
- filesystem mounted readonly
- disk usage

The readonly remount can happen if there is an issue with primary
storage and will lead to an inaccessible router, no VMs on the attached
networks being deployable and no obvious indication what's going on. We
recently ran into this issue.

Similarly for a full disk.

As I mentioned already, we deployed additional monitoring services on
each VR, so we could easily integrate them with our existing monitoring
system.

I would much prefer a fully integrated monitoring solution instead that
can be queried through the CloudStack API.
This would reduce dependency on the environment and allow
router/network health monitoring (and possibly automatic redeploy) at
the discretion of CloudStack.

As far as I know, there isn't even a simple health indicator that can
be queried through the API.

Should I rather submit a feature request for this?

Reply via email to