[prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

2021-04-22 Thread 'Evelyn Pereira Souza' via Prometheus Users
Hi We have those alerts constantly: {group="local", instance="localhost:9090", job="prometheus", rule_group="/etc/config/prometheus-rules.yml;node.rules"} 10 {group="local", instance="localhost:9090", job="prometheus", rule_group="/etc/config/prometheus-rules.yml;prometheus"} 10 Source: ht

Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

2021-04-22 Thread Matthias Rampke
Your best starting point is the rules page of the Prometheus UI (:9090/rules). It will show the error. You can also evaluate the rule expression yourself, using the UI, or maybe using PromLens to help debug expression issues. /MR On Thu, Apr 22, 2021, 19:06 'Evelyn Pereira Souza' via Prometheus U

Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

2021-04-22 Thread 'Evelyn Pereira Souza' via Prometheus Users
On 22.04.21 20:20, Matthias Rampke wrote: Your best starting point is the rules page of the Prometheus UI (:9090/rules). It will show the error. You can also evaluate the rule expression yourself, using the UI, or maybe using PromLens to help debug expression issues. /MR :9090/rules show th

Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

2021-04-23 Thread Matthias Rampke
It seems like you are federating through an ingress or load balancer that balances over multiple Prometheus server replicas. Either federate from each separately, or make sure that you only get responses from one consistently. As an alternative to the global federation, consider Thanos, it scales

Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

2021-04-23 Thread 'Evelyn Pereira Souza' via Prometheus Users
On 23.04.21 20:35, Matthias Rampke wrote: It seems like you are federating through an ingress or load balancer that balances over multiple Prometheus server replicas. Either federate from each separately, or make sure that you only get responses from one consistently. As an alternative to

Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

2021-05-01 Thread Matthias Rampke
That looks good, I think the issue is which target(s) you discover for these jobs. If you scrape Prometheus directly you may have to change the TLS settings depending on your configuration. /MR On Sat, Apr 24, 2021, 08:58 'Evelyn Pereira Souza' via Prometheus Users < prometheus-users@googlegroup

Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

2021-05-13 Thread 'ping...@hioscar.com' via Prometheus Users
We are facing the issue where rules fail sporadically time to time. Are these errors logged somewhere if they cannot be found on UI? Thanks On Saturday, May 1, 2021 at 11:01:49 AM UTC-4 matt...@prometheus.io wrote: > That looks good, I think the issue is which target(s) you discover for > these