subject:"\[prometheus\-users\] alertmanager instance failure"

Re: [prometheus-users] alertmanager instance failure

2020-02-21 Thread Simon Pasquier

You need to run Alertmanager instances on different machines and setup HA as described in the README.md [1]. This way your setup will be resilient to (N-1) instances going down. If you want to detect a failure in your monitoring pipeline, you need to setup something like a dead man's snitch integra

[prometheus-users] alertmanager instance failure

2020-02-18 Thread Dhiman Barman

Hi, We have a setup which has multiple prometheus instances and same number of (alertmanager + webhook) instances. We have a docker which has both alertmanager and webhook processes running. If alertmanager webhook but not alertmanager process, how catastrophic is this event ? What if both go down