You need to run Alertmanager instances on different machines and setup
HA as described in the README.md [1].
This way your setup will be resilient to (N-1) instances going down.
If you want to detect a failure in your monitoring pipeline, you need
to setup something like a dead man's snitch integra
Hi,
We have a setup which has multiple prometheus instances and same number of
(alertmanager + webhook) instances.
We have a docker which has both alertmanager and webhook processes running.
If alertmanager webhook but not alertmanager process, how catastrophic is
this event ?
What if both go down
2 matches
Mail list logo