Re: [prometheus-users] Setting only Alertname and Instance as Subject for Email Notification.

2021-07-13 Thread Yagyansh S. Kumar
Hi Sadhana, You can use group_by feature in the Alertmanager's config file or you can drop the unwanted labels in the alert rule itself. On Tue, 13 Jul, 2021, 8:38 pm sadhana B, wrote: > HI, > > Any solution for this? > > On Tue, Dec 22, 2020 at 3:38 AM yagyans...@gmail.com < >

Re: [prometheus-users] Quantile in Summary for Python Client.

2021-07-08 Thread Yagyansh S. Kumar
ometheus-users+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/c05d05ca-18f1-4c50-a62b-a2c499d92b36n%40googlegroups.com >> <https://groups.google.com/d/msgid/prometheus-users/c05d05ca-18f1-4c50-a

Re: [prometheus-users] Re: Working on Blackbox's "fail_if_body_not_matches_regexp".

2021-03-22 Thread Yagyansh S. Kumar
Hi Julius, Using the expression "^OK$" leads to the failure of all the checks for which response was OK. This seems weird to me. Ideally, should have worked. Any more workarounds or suggestions to achieve this? On Mon, Mar 22, 2021 at 9:38 PM Yagyansh S. Kumar wrote: > T

Re: [prometheus-users] Re: Working on Blackbox's "fail_if_body_not_matches_regexp".

2021-03-22 Thread Yagyansh S. Kumar
are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to prometheus-users+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https

Re: [prometheus-users] Sudden & Permanent increase in Memory consumption.

2021-02-08 Thread Yagyansh S. Kumar
Thanks, Ben. I'll upgrade to a newer Prometheus version and check if the issue still persists. But I still have one doubt here, I am running this Prometheus instance for almost an year now, but I have noticed this memory increase recently only. First time on 31st Jan and 2nd time on Feb 8. If it

Re: [prometheus-users] Setting custom threshold for Disk Storage Space.

2021-02-02 Thread Yagyansh S. Kumar
Hi Stuart, > > A common way is to use timeseries to set the thresholds: > https://www.robustperception.io/using-time-series-as-alert-thresholds >> I have already referred the link and is exactly what I have used in the > example thresholding that I have shared. But how do I use this for a

Re: [prometheus-users] Alert goes to Firing --> Resolved --> Firing immediately.

2020-11-25 Thread Yagyansh S. Kumar
ation does not match. [image: image.png] On 25 November 2020 14:58:50 GMT, "Yagyansh S. Kumar" < > yagyanshsku...@gmail.com> wrote: >> >> >> >> On Wed, 25 Nov, 2020, 8:26 pm Stuart Clark, >> wrote: >> >>> How many Alertmanager instances

Re: [prometheus-users] Alert goes to Firing --> Resolved --> Firing immediately.

2020-11-25 Thread Yagyansh S. Kumar
but I am facing duplicate alert issue in that setup. Another issue that is pending for me. Hence, currently only a single Alertmanager is receiving alerts from my Prometheus instance. On 25 November 2020 14:07:41 GMT, "Yagyansh S. Kumar" < > yagyanshsku...@gmail.com> wrote: >>

Re: [prometheus-users] Alert goes to Firing --> Resolved --> Firing immediately.

2020-11-25 Thread Yagyansh S. Kumar
Hi Stuart. On Wed, 25 Nov, 2020, 6:56 pm Stuart Clark, wrote: > On 25/11/2020 11:46, yagyans...@gmail.com wrote: > > The alert formation doesn't seem to be a problem here, because it > > happens for different alerts randomly. Below is the alert for Exporter > > being down for which it has

Re: [prometheus-users] Debugging OOM issue.

2020-11-25 Thread Yagyansh S. Kumar
Cool, thanks for the quick help. On Wed, Nov 25, 2020 at 1:18 PM Ben Kochie wrote: > No, concurrency only affects how many queries are running at the same > time. > > On Wed, Nov 25, 2020 at 8:45 AM Yagyansh S. Kumar < > yagyanshsku...@gmail.com> wrote: > >> Than

Re: [prometheus-users] Re: Discrepancy in Alert Rule Evaluation.

2020-11-08 Thread Yagyansh S. Kumar
I'll try and get a backtrace and post it here. But still the question remains, is BBE is returning probe_success 0, why is it doing only for 2.20.1 . On Sat, 7 Nov, 2020, 11:33 pm Brian Candler, wrote: > I don't think it's a false alert. If it's the rule you showed, then the > only way you

Re: [prometheus-users] Re: Discrepancy in Alert Rule Evaluation.

2020-11-07 Thread Yagyansh S. Kumar
Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe it's borderline to the scrape interval. >> That's interesting. Here are the top 20 scrape_duration_seconds maxed for last 1 hour by instance. Close to 5 seconds. Can this lead to some issue? But again the thing comes why

Re: [prometheus-users] Re: Discrepancy in Alert Rule Evaluation.

2020-11-07 Thread Yagyansh S. Kumar
Yes, both the Prometheus instances are talking to the same BBE indeed. Infact both have the exact same configuration file and are scraping the exact same targets. Here is the graph for the modified query. Fails visible for 2.20.1 but none for 2.12.0. 2.12.0 [image: image.png] 2.20.1 [image:

Re: [prometheus-users] Re: Querying Response Body via Blackbox Exporter.

2020-08-02 Thread Yagyansh S. Kumar
That's a little strange. We can specify that the probe should fail if the response body does not match a given pattern, but we cannot extract the pattern that is returned by the URL. Quite a basic functionality that would have been nice to have. Anyways, thanks for the help! On Sat, Aug 1, 2020

Re: [prometheus-users] Custom Threshold for a particular instance.

2020-07-02 Thread Yagyansh S. Kumar
to 5 different targets, all of which belong to 5 different components and all the 5 components have more than 1 target but I want the custom threshold to be applied for only a single target from each component. On Fri, Jul 3, 2020 at 12:02 AM Yagyansh S. Kumar wrote: > Hi Christian, > > Ac

Re: [prometheus-users] Custom Threshold for a particular instance.

2020-07-02 Thread Yagyansh S. Kumar
Hi Christian, Actually, I want to another if there is any better way to define the threshold for my 5 new servers that belong to 5 different components. Is writing 5 different recording rules with the same name, and different instance and component labels only way to proceed here? Won't that be a

Re: [prometheus-users] Custom Threshold for Particular Instance.

2020-06-24 Thread Yagyansh S. Kumar
Thanks for the solution. But creating 5 different recording rules for the same custom threshold doesn't seem the best idea. It is good as a last resort I guess. Any better suggestion to approach this? On Thursday, June 25, 2020 at 1:06:02 AM UTC+5:30, sayf eddine Hammemi wrote: > > Correct,

[prometheus-users] Using JMX Exporter to export Cassandra's Metrics.

2020-06-17 Thread Yagyansh S. Kumar
Hi. I need to export metrics from Cassandra DB, and I there have been mixed suggestions to use JMX Exporter or Standalone Cassandra Exporter(Which are many). Which one should be the correct way to go? Also, are there any known issues with JMX Exporter, I mean does it hamper's Cassandra

[prometheus-users] Re: NTP Metrics.

2020-05-20 Thread Yagyansh S. Kumar
Thanks a lot for pointing me in the correct direction. On Wednesday, May 20, 2020 at 12:35:20 PM UTC+5:30, Brian Candler wrote: > > On Wednesday, 20 May 2020 05:48:28 UTC+1, Yagyansh S. Kumar wrote: >> >> Thanks for the response Brian. >> >> I have already enabled t

[prometheus-users] Re: NTP Metrics.

2020-05-19 Thread Yagyansh S. Kumar
Thanks for the response Brian. I have already enabled the NTP collector in all all my servers, but still cannot see the *node_ntp_drift_seconds* metrics giving the output. Apart from that, I have couple of questions here. Firstly, why are we checking the target clock with Prometheus' server?

[prometheus-users] NTP Metrics.

2020-05-19 Thread Yagyansh S. Kumar
Hi. I have my own NTP server configured at x.x.x.x . Now, I want to check if my 10 other servers are synchronized with my NTP server or not. I have gone through a lot of threads and found different opinions with different answers. Also, I guess node_ntp_drift_seconds is an old metrics and

[prometheus-users] Re: Wierd Stale Behaviour of Prometheus.

2020-05-04 Thread Yagyansh S. Kumar
But I did that intentionally to check. Before doing this change my job was: - job_name: 'Ping-All-Servers' metrics_path: /probe params: module: [icmp_prober] file_sd_configs: - files: - /etc/blackbox/*Ping_Targets.yml* Servers.yml relabel_configs: -

[prometheus-users] Wierd Stale Behaviour of Prometheus.

2020-05-04 Thread Yagyansh S. Kumar
Hi. I am using Blackbox's ICMP module to check for whether my server is Pingable or not. I have defined all the targets in a separate file. Everything was working fine till yesterday, but from then even if I remove any target from my target file, Prometheus does not take the updated file and

[prometheus-users] Attaching the hotname in probe_success query.

2020-04-29 Thread Yagyansh S. Kumar
Hi. I am monitoring my Health Check URLs using Blackbox, and I am trying to attach the hostname of the server on which the Health Check is down. My Health Checks are of the form - ServerIP:PORT/my/healthcheck/url (Eg. x.x.x.x:8080/api/a1/healthcheck). I am extracting the ServerIP using regex

[prometheus-users] Re: Recording the hostnames of all targets.

2020-04-24 Thread Yagyansh S. Kumar
Thanks Brain. > > Can you give some more specific examples? What metric are you joining > with - perhaps node_uname_info? >> > - alert: HighCpuLoadCrit expr: (node_load15 > (2 * count without (cpu, mode) (node_cpu_seconds_total{mode="system"}))) ** on(instance)

[prometheus-users] Recording the hostnames of all targets.

2020-04-24 Thread Yagyansh S. Kumar
Hi. So, I am using IP:PORT as targets(I know its not ideal, I should have only IPs, will switch to it soon) for all my node exporter jobs. I am getting the hostnames of my servers in the alerts using the group_left and joining them to my original alert query. Now, the problem is when alert of

Re: [prometheus-users] URL Working, Response Code captured by Blackbox also 200, but still probe_success failing.

2020-04-23 Thread Yagyansh S. Kumar
For example: > http://localhost:9115/probe?module=http_2xx=https://nonexistentbogusdomain.io=true > > If that doesn't help, you'll probably need to share more information about > your configuration on the Prometheus and BBE side. > > On Thu, Apr 23, 2020 at 10:40 AM Yagyansh

[prometheus-users] URL Working, Response Code captured by Blackbox also 200, but still probe_success failing.

2020-04-23 Thread Yagyansh S. Kumar
Hi. I have couple of Service Health Checks on my AWS Instances. (Eg. 18.220.x.x:PORT/health). I am using Blackbox to monitor all my Service Health Checks. These AWS Instances Health Checks are being shown as DOWN by Blackbox even though they are working fine. Also, when I check response code

Re: [prometheus-users] Re: startsAt and endsAt clarification.

2020-04-21 Thread Yagyansh S. Kumar
ing sent anymore after a while. > > On Tue, Apr 21, 2020 at 12:08 PM Yagyansh S. Kumar > wrote: > >> I am sorry I forgot to mention that I am using v1 Alertmanager API to >> capture this data. /api/v1/alerts >> >> On Tuesday, April 21, 2020 at 3:38:00 PM UTC+5:30, Ya

Re: [prometheus-users] Slicing the metrics scraped by HAPROXY Exporter.

2020-04-21 Thread Yagyansh S. Kumar
Sorry posted it in the wrong thread. Thanks for your help. On Tuesday, April 21, 2020 at 5:02:23 PM UTC+5:30, Brian Candler wrote: > > Not sure what you mean - time difference between what and what? > -- You received this message because you are subscribed to the Google Groups "Prometheus

Re: [prometheus-users] Slicing the metrics scraped by HAPROXY Exporter.

2020-04-21 Thread Yagyansh S. Kumar
Is there a significant time difference in query evaluation while querying directly through API? On Tuesday, April 21, 2020 at 3:51:56 PM UTC+5:30, Brian Candler wrote: > > On Tuesday, 21 April 2020 10:26:47 UTC+1, Yagyansh S. Kumar wrote: >> >> If I want to see the

Re: [prometheus-users] Slicing the metrics scraped by HAPROXY Exporter.

2020-04-21 Thread Yagyansh S. Kumar
Oh, yes. Thanks a lot, Brain. On Tuesday, April 21, 2020 at 3:51:56 PM UTC+5:30, Brian Candler wrote: > > On Tuesday, 21 April 2020 10:26:47 UTC+1, Yagyansh S. Kumar wrote: >> >> If I want to see the number of requests that increased on let say 17th >>

Re: [prometheus-users] StartsAt time is right but endsAt time in alertmanager API is not matching. What does endsAt time actually stands for ?

2020-04-21 Thread Yagyansh S. Kumar
21 Apr 2020 at 11:12, Yagyansh S. Kumar > wrote: > >> So, we can't rely on either Prometheus' internal ALERTS_FOR_STATE and >> endsAt, StartsAt also. >> >> Then, what should we use to get the age of alerts? >> > > I'd personally look at it on the relevant gra

Re: [prometheus-users] StartsAt time is right but endsAt time in alertmanager API is not matching. What does endsAt time actually stands for ?

2020-04-21 Thread Yagyansh S. Kumar
So, we can't rely on either Prometheus' internal ALERTS_FOR_STATE and endsAt, StartsAt also. Then, what should we use to get the age of alerts? On Tuesday, April 21, 2020 at 3:40:02 PM UTC+5:30, Brian Brazil wrote: > > On Tue, 21 Apr 2020 at 11:06, Rahul Hada > > wrote: > >> We have configured

[prometheus-users] Re: startsAt and endsAt clarification.

2020-04-21 Thread Yagyansh S. Kumar
I am sorry I forgot to mention that I am using v1 Alertmanager API to capture this data. /api/v1/alerts On Tuesday, April 21, 2020 at 3:38:00 PM UTC+5:30, Yagyansh S. Kumar wrote: > > Hi. What and how are the timestamps gathered for startsAt and endsAt? Is > there any docu

[prometheus-users] startsAt and endsAt clarification.

2020-04-21 Thread Yagyansh S. Kumar
Hi. What and how are the timestamps gathered for startsAt and endsAt? Is there any documentation for this? >From what I have observed startsAt seems to be giving the correct time of when the alert entered the "firing" state. But endsAt time seem to be ambiguous to me, because one of my

Re: [prometheus-users] Slicing the metrics scraped by HAPROXY Exporter.

2020-04-21 Thread Yagyansh S. Kumar
One more query regarding this. If I want to see the number of requests that increased on let say 17th April. How to approach that? On Tuesday, April 21, 2020 at 1:19:43 PM UTC+5:30, Brian Candler wrote: > > Those metrics like haproxy_frontend_http_requests_total are counters. > > If you want

Re: [prometheus-users] Slicing the metrics scraped by HAPROXY Exporter.

2020-04-21 Thread Yagyansh S. Kumar
Thanks a lot Brain, this works. On Tuesday, April 21, 2020 at 1:19:43 PM UTC+5:30, Brian Candler wrote: > > Those metrics like haproxy_frontend_http_requests_total are counters. > > If you want see much they've increased in the last 24 hours, then you need > to use this function >

Re: [prometheus-users] Slicing the metrics scraped by HAPROXY Exporter.

2020-04-21 Thread Yagyansh S. Kumar
> HAProxy does not output historic data about past requests (nor would > Prometheus be able to scrape it), so you'd have to scrape the HAProxy > exporter for 24h, 2 days, etc., to get that amount of history into > Prometheus. >>> > I think I didn't get this part properly. I am already

[prometheus-users] Slicing the metrics scraped by HAPROXY Exporter.

2020-04-21 Thread Yagyansh S. Kumar
Hi. I am using HAPROXY for configuring my LB-Rules and using HAPROXY Exporter to get the metrics. Now, majorly I want to keep track of the Requests coming to my LB and the responses served by my LB that is configured on the HAPROXY. Now, haproxy_frontend_http_requests_total

Re: [prometheus-users] Excluding particular network interfaces from monitoring for some servers.

2020-04-20 Thread Yagyansh S. Kumar
> I usually recommend doing this by Julius's other suggestion, the > node_exporter textfile collector. This makes it easy to integrate the > "ignore this interface" metric into your configuration management on the > node. > > On Mon, Apr 20, 2020 at 5:13 PM Yagyansh

Re: [prometheus-users] Excluding particular network interfaces from monitoring for some servers.

2020-04-20 Thread Yagyansh S. Kumar
Sorry, I didn't notice, that I have forgot to share the configuration of the recording rule. I thought I have pasted it. Anyways, you could have pointed that our nicely too. On Monday, April 20, 2020 at 3:34:41 PM UTC+5:30, Brian Candler wrote: > > Please think carefully about what you've just

Re: [prometheus-users] Excluding particular network interfaces from monitoring for some servers.

2020-04-20 Thread Yagyansh S. Kumar
on each > machine's Node Exporter, if you have the info there. > > C) Hardcode the selection of those excepted devices directly into your > alerting expression, with multiple "unless"-es. Probably a bit ugly :) > > On Mon, Apr 20, 2020 at 9:45 AM Yagyansh S. Kumar > wrot

[prometheus-users] Excluding particular network interfaces from monitoring for some servers.

2020-04-20 Thread Yagyansh S. Kumar
Hi. I have configured an alert to get notified whenever any of my Network Interfaces goes down. Now, on some servers some interfaces we have made down intentionally and I want to exclude those interfaces for those particular servers from the alert. What is the best possible way to do this? I

Re: [prometheus-users] What is the criteria in Prometheus for Age of alerts in ALERTS_FOR_STATE.

2020-04-19 Thread Yagyansh S. Kumar
Thanks Brain. What other way do you suggest to get the accurate age of alerts? -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to

[prometheus-users] What is the criteria in Prometheus for Age of alerts in ALERTS_FOR_STATE.

2020-04-19 Thread Yagyansh S. Kumar
Hi. I want to get the age of my alerts, and have been playing around with ALERTS_FOR_STATE for the same. After converting the Epoch Time to Normal Date-Time, the age that I am getting for the alerts seem to be incorrect for some of the alerts. Also, I can see a lot of my alerts are having the

[prometheus-users] Re: Getting the error because of which node_filesystem_device_error becomes 1.

2020-04-18 Thread Yagyansh S. Kumar
Adding to this, I noticed something very strange. I am seeing that node_filesystem_device_error gives 1, but when I login into the servers, everything seems to be fine with the NFS mount. Even the statfs call is successful. Is this a bug? If not, how to know the reason because of which

[prometheus-users] Getting the error because of which node_filesystem_device_error becomes 1.

2020-04-18 Thread Yagyansh S. Kumar
Hi. After a lot of discussion on this forum for monitoring NFS hang issues, I am using node_filesystem_device_error to see if my NFS mount is hanging or not. Now, since node_filesystem_device_error is basically a statfs call, there can be more than one reason for statfs call to fail. So, if

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-04-18 Thread Yagyansh S. Kumar
is in hung state, node_filesystem_device_error should definitely inform you about the same. On Sunday, March 15, 2020 at 3:56:19 AM UTC+5:30, Yagyansh S. Kumar wrote: > > Sure. Will absolutely do. > > On Sunday, March 15, 2020 at 3:30:59 AM UTC+5:30, Christian Hoffmann wrote: >>

Re: [prometheus-users] Discrepancy in Resolved Alerts.

2020-04-18 Thread Yagyansh S. Kumar
I know this cannot be called as a Bug, but I find it a little odd that you cannot know the value that it dropped to in your alert once it has resolved. On Saturday, April 18, 2020 at 7:56:47 PM UTC+5:30, Yagyansh S. Kumar wrote: > > Thanks a lot for the detailed explantion, Brain. >

Re: [prometheus-users] Discrepancy in Resolved Alerts.

2020-04-18 Thread Yagyansh S. Kumar
Thanks a lot for the detailed explantion, Brain. I guess I need to monitor the resolved alerts a bit more closely and then take a call. On Saturday, April 18, 2020 at 3:16:56 PM UTC+5:30, Brian Candler wrote: > > I can see two possible issues here. > > Firstly, the value of the annotation you

Re: [prometheus-users] Append the value from one metric to the label of another metric.

2020-04-18 Thread Yagyansh S. Kumar
Btw I am struggling with the resolved alerts. I have posted the details in below mentioned thread. Would be really helpful if you can suggest something. Thread Link - https://groups.google.com/forum/#!topic/prometheus-users/LLsPBIvLIME -- You received this message because you are subscribed

Re: [prometheus-users] Append the value from one metric to the label of another metric.

2020-04-18 Thread Yagyansh S. Kumar
; no HTTP status code at all for a failed check (like if the connection > couldn't be established), but luckily the BBE still exposes the status code > metric in that case, but with a value of 0, so the expression still works > in that case. > > On Sat, Apr 18, 2020 at 9:07 AM Ya

Re: [prometheus-users] Append the value from one metric to the label of another metric.

2020-04-18 Thread Yagyansh S. Kumar
Thanks a lot Julius. Btw., you could change the "=~" to just "=" because that regex is just doing a full string equality match anyway. >> Thanks for pointing it out. I recently removed 2-3 jobs from that alert, so forgot to remove the regex matcher. > >> -- >> You received this message

Re: [prometheus-users] Append the value from one metric to the label of another metric.

2020-04-18 Thread Yagyansh S. Kumar
Thanks, Julian. Btw., you could change the "=~" to just "=" because that regex is just doing a full string equality match anyway. >> Thanks for pointing it out. I recently removed 2-3 jobs from this alert and forgot to remove the regex matcher. > > On Fri, Ap

[prometheus-users] Append the value from one metric to the label of another metric.

2020-04-17 Thread Yagyansh S. Kumar
Hi. I am using Blackbox exporter to monitor my Application's and LB's HealthCheck URLs. My alert for this looks like below: - alert: ServiceHTTPChecks expr: probe_success{job=~"blackbox_Service-HealthChecks"} == 0 for: 2m labels: severity: "CRITICAL" annotations:

Re: [prometheus-users] Discrepancy in Resolved Alerts.

2020-04-14 Thread Yagyansh S. Kumar
$labels.nodename }}* is more than 90%." description: "Current Value = *{{ $value | humanize }}*" identifier: "*Cluster:* `{{ $labels.cluster }}`, *node:* `{{ $labels.node }}` " On Tuesday, April 14, 2020 at 4:41:43 PM UTC+5:30, Stuart Clark wrote: > > On 2020-04-14 12:

[prometheus-users] Discrepancy in Resolved Alerts.

2020-04-14 Thread Yagyansh S. Kumar
Hi. I am using Alertmanager version 0.16.0. The Resolved Alerts that I am receiving are wrong. The Alertmanager fires the resolved alert as soon as the value decreases even slightly i.e it does not wait for the value to get less than the threshold. And this thing is happening for every alert.

[prometheus-users] Re: Monitoring SAN Switches using Prometheus.

2020-04-13 Thread Yagyansh S. Kumar
FAN Speed, Network Interfaces metrics etc. also are required. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com.

[prometheus-users] Monitoring SAN Switches using Prometheus.

2020-04-13 Thread Yagyansh S. Kumar
Hi. I want to monitor my SAN Switches and expect metrics such as Port Utilization, Uptime, CPU Usage, Memory Usage etc. I have tried using SNMP, and it gets me the network side of metrics perfectly but I need System Metrics too(CPU, Memory etc.). Is there any exporter for this? If not can

[prometheus-users] Collect Metrics Automatically from Custom Supervisord Port.

2020-04-02 Thread Yagyansh S. Kumar
Hi. In many of my servers, supervisord isn't running on the default port and the port also varies from cluster to cluster. Is there a way we can detect the port of the supervisor and then collect the metrics from that custom URL:PORT? Thanks! -- You received this message because you are

Re: [prometheus-users] Getting status of services in CentOS 6.

2020-04-01 Thread Yagyansh S. Kumar
Alas, only if migration from CentOS 6 to 7 was in my hand :P. But thanks for the suggestion, will give process exporter a try. On Wednesday, April 1, 2020 at 2:50:39 AM UTC+5:30, Christian Hoffmann wrote: > > Hi, > > On 3/31/20 12:22 PM, Yagyansh S. Kumar wrote: > > H

[prometheus-users] Re: Getting status of services in CentOS 6.

2020-04-01 Thread Yagyansh S. Kumar
itelist="(apache2|ssh|rsyslog|nginx).service" > but not sure how to do it? > > Again sorry to hijack your post and sorry can't add anything to help > > On Tuesday, March 31, 2020 at 6:22:17 AM UTC-4, Yagyansh S. Kumar wrote: >> >> Hi. We have systemd collector in

Re: [prometheus-users] Getting the Age of alerts.

2020-03-20 Thread Yagyansh S. Kumar
*1584698633* What does this value represent. On Wednesday, March 18, 2020 at 3:06:54 AM UTC+5:30, Julien Pivotto wrote: > > On 17 Mar 22:33, Christian Hoffmann wrote: > > Hi, > > > > On 3/17/20 10:33 AM, Yagyansh S. Kumar wrote: > > > Hi. I want to extract

[prometheus-users] Push the alert details to a third party API as soon an alert fires.

2020-03-19 Thread Yagyansh S. Kumar
Hi. I want to push the my alerts to a ticketing tool. The ticketing tool has an API using which we can push our alerts to it. Now, what I want to do is that as soon as a alert fires I want to push my alert details to the tool by calling the ticketing tool's API. Can anyone help and give some

[prometheus-users] Getting the Age of alerts.

2020-03-17 Thread Yagyansh S. Kumar
Hi. I want to extract the age of the alerts(i.e from when the alert is Critical or Warning or even Resolved). Is this possible? -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from

[prometheus-users] Re: Probing Endpoints with different authentication strings.

2020-03-17 Thread Yagyansh S. Kumar
Thanks a lot. On Tuesday, March 17, 2020 at 1:46:42 PM UTC+5:30, Brian Candler wrote: > > On Tuesday, 17 March 2020 03:33:40 UTC, Yagyansh S. Kumar wrote: >> >> And that would mean making 10 different jobs to probe those 10 URLs. >> Right? >> >> > No:

[prometheus-users] Re: Probing Endpoints with different authentication strings.

2020-03-16 Thread Yagyansh S. Kumar
And that would mean making 10 different jobs to probe those 10 URLs. Right? On Monday, March 16, 2020 at 8:29:24 PM UTC+5:30, Brian Candler wrote: > > You'll need to create 10 different modules - there's no way to expand > parameters into the module settings. > > You can use a script to generate

[prometheus-users] Re: Probing Endpoints with different authentication strings.

2020-03-16 Thread Yagyansh S. Kumar
fig of prom and blackbox on > how you have done basic auth? > > Thanks > Eswar > > On Monday, March 16, 2020 at 3:37:59 PM UTC+1, Yagyansh S. Kumar wrote: >> >> Hi. I have around 10 URLs that I want to monitor. All are under Basic >> Auth and have different a

[prometheus-users] Probing Endpoints with different authentication strings.

2020-03-16 Thread Yagyansh S. Kumar
Hi. I have around 10 URLs that I want to monitor. All are under Basic Auth and have different auth credentials. Is it all possible to probe them using a single blackbox module? Or will I have to create 10 different modules, which seems a very tedious task. Thanks! -- You received this

Re: [prometheus-users] Re: Using Regex in the Annotations of Alert.

2020-03-15 Thread Yagyansh S. Kumar
Thanks a lot, Christian. Will try them out and report back. Also, according to you will the Step 3 add any significant overhead? I mean will it cause any kind of slowness? On Sunday, March 15, 2020 at 5:26:47 PM UTC+5:30, Christian Hoffmann wrote: > > Hi, > > On 3/15/20 10:07 AM

[prometheus-users] Using Regex in the Annotations of Alert.

2020-03-15 Thread Yagyansh S. Kumar
Hi. I want to add the dashboard link in the alert of that particular service. That dashboard takes the server IP and hostname as input. From the instance label, I want to remove the port number and pass it as input to be dashboard. Configured Alert: - alert: OutOfDiskSpace-Crit expr:

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Yagyansh S. Kumar
Sure. Will absolutely do. On Sunday, March 15, 2020 at 3:30:59 AM UTC+5:30, Christian Hoffmann wrote: > > Hi, > > On 3/14/20 10:35 PM, Yagyansh S. Kumar wrote: > > Yes, I did experiment with node_filesystem_device_error earlier based on > > Ben's suggest

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Yagyansh S. Kumar
, I'll definitely give node_filesystem_device_error another try and see if I can come up with something interesting. Thanks a lot for your help. Cheers! On Sunday, March 15, 2020 at 2:49:01 AM UTC+5:30, Christian Hoffmann wrote: > > On 3/14/20 10:01 PM, Yagyansh S. Kumar wrote: > > Als

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Yagyansh S. Kumar
? On Saturday, March 14, 2020 at 10:06:38 PM UTC+5:30, Christian Hoffmann wrote: > > On 3/14/20 5:06 PM, Yagyansh S. Kumar wrote: > > Can you explain in a little detail please? > I'll try to walk through your example in several steps: > > ## Step 1 > Your initial expression was

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Yagyansh S. Kumar
Awesome explanation. This helps a lot. Thanks, I appreciate it. On Saturday, March 14, 2020 at 10:06:38 PM UTC+5:30, Christian Hoffmann wrote: > > On 3/14/20 5:06 PM, Yagyansh S. Kumar wrote: > > Can you explain in a little detail please? > I'll try to walk through your exa

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Yagyansh S. Kumar
Can you explain in a little detail please? On Saturday, March 14, 2020 at 9:26:39 PM UTC+5:30, Christian Hoffmann wrote: > > Hi, > > On 3/14/20 4:32 PM, Yagyansh S. Kumar wrote: > > Hi. In my prometheus.yml file all the targets necessarily have 2 labels > >

Re: [prometheus-users] "All" value problem across dashboards in Grafana while using Prometheus as Datasource.

2020-03-14 Thread Yagyansh S. Kumar
No, it does not work on reload too. I have to manually select the "All" option in the cluster in the target dashboard. This problem remains if I swap the source and target dashboards. Ideally, "All" is the value that should be passed to the target dashboard and not {value1,value2,...}. One more

[prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Yagyansh S. Kumar
Hi. In my prometheus.yml file all the targets necessarily have 2 labels viz "cluster" and "node". I have configured an alert for CPU Load with a dynamic threshold of "Number of Cores of the server". Configured alert: - alert: HighCpuLoad expr: (node_load15 > count without (cpu, mode)

Re: [prometheus-users] Getting the dynamic threshold to print in the alert.

2020-03-14 Thread Yagyansh S. Kumar
Thanks for the help. Will try this. On Saturday, March 14, 2020 at 5:07:19 PM UTC+5:30, Christian Hoffmann wrote: > > Hi, > > On 3/13/20 1:55 PM, Yagyansh S. Kumar wrote: > > Hi. I am using Number of Cores of a server as the threshold for CPU > > Load. It is working

Re: [prometheus-users] "All" value problem across dashboards in Grafana while using Prometheus as Datasource.

2020-03-14 Thread Yagyansh S. Kumar
-> Um, how exactly do I conclude this? On Saturday, March 14, 2020 at 5:21:20 PM UTC+5:30, Christian Hoffmann wrote: > > On 3/13/20 9:32 PM, Yagyansh S. Kumar wrote: > > Hi. In one of the dashboards(Say D-1), I have created a variable called > > cluster and it has "

[prometheus-users] "All" value problem across dashboards in Grafana while using Prometheus as Datasource.

2020-03-13 Thread Yagyansh S. Kumar
Hi. In one of the dashboards(Say D-1), I have created a variable called cluster and it has "All" option enabled. I have the same variable in another dashboard(Say D-2) and there are links of D-2 in D-1 at different places according to the value of cluster variable. When the value of cluster is

[prometheus-users] Getting the dynamic threshold to print in the alert.

2020-03-13 Thread Yagyansh S. Kumar
Hi. I am using Number of Cores of a server as the threshold for CPU Load. It is working fine but I want to print the number of cores also in the alert. Configured Alert: - alert: HighCpuLoad expr: (node_load15 > count without (cpu, mode) (node_cpu_seconds_total{mode="system"})) *

[prometheus-users] Getting the list and number of Silenced Alerts in Grafana using Alertmanager.

2020-03-13 Thread Yagyansh S. Kumar
Is there any way to query the alertmanager and get the Silenced Alerts. If there isn't any direct method, is there any workaround for this? Thanks in advance. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group

[prometheus-users] Re: "All" value in any variable is creating issue while using Alertmanager as a datasource in Grafana.

2020-03-12 Thread Yagyansh S. Kumar
Oh man! Can't believe I missed the tilde. Thanks Brain. On Thursday, March 12, 2020 at 2:25:29 PM UTC+5:30, Brian Candler wrote: > > If you are using multi-select in Grafana, you have to write your label > match as *cluster=~"$cluster"* > -- You received this message because you are subscribed

[prometheus-users] "All" value in any variable is creating issue while using Alertmanager as a datasource in Grafana.

2020-03-12 Thread Yagyansh S. Kumar
Hi. I have made a variable "cluster" in Grafana that contains all my clusters(ADS,TV etc.) present in my Infra. Now, I want the critical alert count for every service(CPU, Memory, Disk) clusterwise which I am getting successfully using the alertmanager query - *alertname="CPULOAD",

Re: [prometheus-users] Giving dynamic thresholds in alertmanager.

2020-03-11 Thread Yagyansh S. Kumar
wrote: > > On 11 Mar 10:59, Yagyansh S. Kumar wrote: > > I mean it doesn't work in giving me the actual Load value. > > The expression you mentioned will give be perfect in defining the > threshold > > but the value that this expression will give will be (Actual

Re: [prometheus-users] Giving dynamic thresholds in alertmanager.

2020-03-11 Thread Yagyansh S. Kumar
Oh sorry, my bad! Yes, it does. Was comparing wrong things. Thanks a lot! On Wednesday, March 11, 2020 at 11:31:49 PM UTC+5:30, Julien Pivotto wrote: > > On 11 Mar 10:59, Yagyansh S. Kumar wrote: > > I mean it doesn't work in giving me the actual Load value. > > The expres

Re: [prometheus-users] Giving dynamic thresholds in alertmanager.

2020-03-11 Thread Yagyansh S. Kumar
with threshold still being the Number of Cores. On Wednesday, March 11, 2020 at 11:23:09 PM UTC+5:30, Yagyansh S. Kumar wrote: > > Thanks for the response Julien, but I have already tried the query that > you have mentioned, but it doesn't work. > > On Wednesday, March 11, 2020 at

Re: [prometheus-users] Giving dynamic thresholds in alertmanager.

2020-03-11 Thread Yagyansh S. Kumar
Thanks for the response Julien, but I have already tried the query that you have mentioned, but it doesn't work. On Wednesday, March 11, 2020 at 11:21:35 PM UTC+5:30, Julien Pivotto wrote: > > On 11 Mar 10:49, Yagyansh S. Kumar wrote: > > I have one more small query.

Re: [prometheus-users] Giving dynamic thresholds in alertmanager.

2020-03-11 Thread Yagyansh S. Kumar
:30, Yagyansh S. Kumar wrote: > > Maybe I'll refine the threshold even further but for now this works. > Thanks a lot for help. > > On Wednesday, March 11, 2020 at 10:47:14 PM UTC+5:30, Harald Koch wrote: >> >> >> >> On Wed, Mar 11, 2020, at 13:00, Ya

Re: [prometheus-users] Giving dynamic thresholds in alertmanager.

2020-03-11 Thread Yagyansh S. Kumar
Maybe I'll refine the threshold even further but for now this works. Thanks a lot for help. On Wednesday, March 11, 2020 at 10:47:14 PM UTC+5:30, Harald Koch wrote: > > > > On Wed, Mar 11, 2020, at 13:00, Yagyansh S. Kumar wrote: > > Hi. I have configured alert for CPU Load for

[prometheus-users] Giving dynamic thresholds in alertmanager.

2020-03-11 Thread Yagyansh S. Kumar
Hi. I have configured alert for CPU Load for my servers and my current threshold is 8 for warning and 10 for critical. I want to make this threshold dynamic i.e I want the critical alert when the CPU Load becomes greater than the number of CPU Cores of the machine. Eg. For a server with 8 CPU

[prometheus-users] Getting all the checks for a server in the same dashboard.

2020-03-10 Thread Yagyansh S. Kumar
Hi. I monitoring around 2500+ servers. I am using node_exporter to collect system metrics and blackbox for Service Health Checks that are configured on those servers. Now, all the servers do not have the same health check and port. Some don't even have any health check or port configured. I

[prometheus-users] Getting I/O of NFS mount by mountpoint.

2020-03-10 Thread Yagyansh S. Kumar
Hi. I have a system where multiple NFS' are mounted across servers. I want the total Input and Output operations of the NFS mount based on the mountpoint(Eg. I have a NFS mounted at /data which is mounted on 100 servers. Now, I want I/O of mountpoint /data.) I have checked the stats scraped by

Re: [prometheus-users] Working of Textfile Collector.

2020-03-07 Thread Yagyansh S. Kumar
uments on your node exporter. > > https://prometheus.io/docs/concepts/data_model/ > > https://github.com/prometheus/node_exporter/blob/master/README.md > > On Sat, Mar 7, 2020, 11:01 AM Yagyansh S. Kumar > wrote: > >> Hi. I am just now able to get my head around t

[prometheus-users] Working of Textfile Collector.

2020-03-07 Thread Yagyansh S. Kumar
Hi. I am just now able to get my head around the working of textfile collector. Can someone explain or provide a good link to understand its working? Thanks! -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group

[prometheus-users] Re: Getting hostnames along with IPs in Alerts.

2020-03-05 Thread Yagyansh S. Kumar
Cool. Thank you. Just started getting comfortable with Prometheus, so getting a lot of questions in mind. Thanks for the quick responses! :) On Friday, March 6, 2020 at 1:29:05 AM UTC+5:30, Brian Candler wrote: > > On Thursday, 5 March 2020 19:23:51 UTC, Yagyansh S. Kumar wrote: >>

[prometheus-users] Re: Getting hostnames along with IPs in Alerts.

2020-03-05 Thread Yagyansh S. Kumar
but still is there something that can be done? On Friday, March 6, 2020 at 12:10:57 AM UTC+5:30, Yagyansh S. Kumar wrote: > > Cool, thanks a ton. > > Just one doubt. I have tried and have been using the join method that you > mentioned, in grafana for getting the hostnames as legen

[prometheus-users] Re: Getting hostnames along with IPs in Alerts.

2020-03-05 Thread Yagyansh S. Kumar
Cool, thanks a ton. Just one doubt. I have tried and have been using the join method that you mentioned, in grafana for getting the hostnames as legends but how do I extract the hostname from the expression in alertmanager? I mean we have the expression but I am stuck at getting the nodename

  1   2   >