[prometheus-users] consul discovery

2024-02-28 Thread sri L
Hi all, I am trying to register onprem multiple nodes in consul DB using single json file but while registering through API, getting syntax error (*Request decode failed: json: cannot unmarshal array into Go value of type structs.RegisterRequest*) I am using below curl command for registering

Re: [prometheus-users] Alert Query

2024-02-19 Thread sri L
> topk(3,metrics) ? > > sri L schrieb am Di., 20. Feb. 2024, 05:19: > >> Hi all, >> >> I am looking for a way to send out an alert with top 3 cpu/memory >> utilization processes when total cpu/memory utilization goes above 80% for >> a node. >&g

[prometheus-users] Alert Query

2024-02-19 Thread sri L
Hi all, I am looking for a way to send out an alert with top 3 cpu/memory utilization processes when total cpu/memory utilization goes above 80% for a node. I can create an alert using node metrics to send notification when cpu utilization is above 80% but unable to find a way to include top 3

Re: [prometheus-users] Re: Alert Query

2024-02-13 Thread sri L
od_status_ready{condition="true"} == 0 and > max_over_time(kube_pod_status_ready{condition="true"}[10m]) == 1 > > This will fire if the pod was ready at any time in the last 10 minutes, > but is not ready now. This does mean that the alert will clear after 10 > mi

[prometheus-users] Alert Query

2024-02-13 Thread sri L
Hi all, I am trying to create an alert rule for pod unreachable condition. Below expression I used but alert was triggering whenever new pod got created, we want alert only when the previous state of a pod was in the ready state and then went to unreachable/terminating/pending states. kube_pod

[prometheus-users] Kubernetes metric expression

2023-11-22 Thread sri L
Hi all, I am looking for metric expression for CPU usage of pod in percentage in kubernetes. I found following expression from the blog but the result is coming in seconds instead of percentage. Kindly suggest how we can get CPU utilization in percentage, Thanks sum(rate(container_cpu_usage_sec

[prometheus-users] Re: Alert Query

2023-10-07 Thread sri L
Prometheus itself, > that can vary depending on the issue but here's a good list of known > alerting conditions that you can use to monitor the state of Prometheus > instances: > > https://samber.github.io/awesome-prometheus-alerts/rules.html#prometheus-self-monitoring > > >

[prometheus-users] Alert Query

2023-10-04 Thread sri L
Hi all, Can anyone please suggest alert expression for configuring alert rule for below condition. "metric data is not being received by Prometheus and to alert that there is an issue with the Prometheus and it is unable to scrape". Thanks -- You received this message because you are subscr

[prometheus-users] Prometheus-operator use case

2023-08-26 Thread sri L
Hi, Can we use Prometheus-Operator to bundle helm chart for kube-state-metrics and cadvisor ? Kindly suggest and share if we have any sample configuration Thanks -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this gr

[prometheus-users] Error while doing remote write

2023-08-07 Thread sri L
Hi all, I am trying to do remotewrite from Prometheus to cortex but getting below error. Can anyone please suggest how to solve this, thanks component=remote level=error remote_name=af04b2 url=http://abc/push msg="non-recoverable error while sending metadata" co

[prometheus-users] JMX Exporter

2023-07-09 Thread sri L
Hi All, I am trying to install JMX exporter for monitoring Java processes in linux and getting the error "no main manifest attribute, in jar" Please let us know if I am missing any thing. Thanks in advance -- You received this message because you are subscribed to the Google Groups "Promethe

[prometheus-users] Re: Blackbox error

2023-06-25 Thread sri L
was asked and answered recently: > https://groups.google.com/g/prometheus-users/c/1Es9Ok9tK-s/m/tuBnU3OPBAAJ > (you'll need to change the blackbox_exporter config slightly to set a > bearer token header instead of basic auth) > > On Friday, 23 June 2023 at 09:14:55 UTC sri L wrot

[prometheus-users] Blackbox error

2023-06-23 Thread sri L
Hi all, We are trying to use blackbox exporter to monitor one of the URL using http_2xx module and bearer token at job level for authorization, we are receiving following error message and probe failed level=info msg="Address does not match first address, not sending TLS ServerName" first=IP a

[prometheus-users] cloudwatch alert rule issue

2023-06-05 Thread sri L
Hi all, We are using Cloudwatch datasource in Grafana and configured alerts using following expression filter @message like 'error' or message like '400' |stats count(*) as exceptionCount by bin(1h) |sort exceptionCount desc When there is no data, we are getting below error and it is firing no

Re: [prometheus-users] server uptime

2023-02-21 Thread sri L
p{job="node"}[30d]) * 100 > > This is assuming that the scrape interval is not changing over those 30d, > as otherwise you would be weighting some periods (the ones with a higher > scrape frequency) more than others. > > On Fri, Feb 17, 2023 at 5:25 AM sri L wrote: > &g

[prometheus-users] server uptime

2023-02-16 Thread sri L
Hi all, I am looking for server uptime percentage metrics on a monthly basis Example: If server is down for 60hrs out of 720hrs of a month the uptime has to show 91.66% in dashboard Please suggest if you have a relevant expression to serve this purpose Thanks -- You received this message beca

[prometheus-users] Email Subject for alerts

2022-11-23 Thread sri L
Hi all, I want to add a label name in subject header. This is not a common label which we generally defined under target level. The label "name" is coming in metrics only and i want to see that "name" in email subject. Currently iam using this header value [{{ .Status | toUpper }}] {{.CommonLabe

[prometheus-users] Node restart alert rule

2022-11-12 Thread sri L
Hi all, Can anyone help me with node restart alert rule expression Thanks -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@google

[prometheus-users] prometheus rule evaluation failure

2022-11-03 Thread sri L
Hi all Can anyone help me here where to check. We are receiving alerts frequently for prometheus rule evaluation failure. Didn't understand why this alert is firing continuously Prometheus encountered 1.33662502 rule evaluation failures, leading to potentially ignored alerts. VALUE = 1.