Hi Sadhana,
You can use group_by feature in the Alertmanager's config file or you can
drop the unwanted labels in the alert rule itself.
On Tue, 13 Jul, 2021, 8:38 pm sadhana B, wrote:
> HI,
>
> Any solution for this?
>
> On Tue, Dec 22, 2020 at 3:38 AM yagyans...@gmail.com <
>
ometheus-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/c05d05ca-18f1-4c50-a62b-a2c499d92b36n%40googlegroups.com
>> <https://groups.google.com/d/msgid/prometheus-users/c05d05ca-18f1-4c50-a
Hi Julius,
Using the expression "^OK$" leads to the failure of all the checks for
which response was OK. This seems weird to me. Ideally, should have worked.
Any more workarounds or suggestions to achieve this?
On Mon, Mar 22, 2021 at 9:38 PM Yagyansh S. Kumar
wrote:
> T
are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to prometheus-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https
Thanks, Ben. I'll upgrade to a newer Prometheus version and check if the
issue still persists.
But I still have one doubt here, I am running this Prometheus instance for
almost an year now, but I have noticed this memory increase recently only.
First time on 31st Jan and 2nd time on Feb 8. If it
Hi Stuart,
>
> A common way is to use timeseries to set the thresholds:
> https://www.robustperception.io/using-time-series-as-alert-thresholds
>> I have already referred the link and is exactly what I have used in the
> example thresholding that I have shared. But how do I use this for a
ation does not match.
[image: image.png]
On 25 November 2020 14:58:50 GMT, "Yagyansh S. Kumar" <
> yagyanshsku...@gmail.com> wrote:
>>
>>
>>
>> On Wed, 25 Nov, 2020, 8:26 pm Stuart Clark,
>> wrote:
>>
>>> How many Alertmanager instances
but I am facing duplicate alert issue in that setup. Another
issue that is pending for me. Hence, currently only a single Alertmanager
is receiving alerts from my Prometheus instance.
On 25 November 2020 14:07:41 GMT, "Yagyansh S. Kumar" <
> yagyanshsku...@gmail.com> wrote:
>>
Hi Stuart.
On Wed, 25 Nov, 2020, 6:56 pm Stuart Clark,
wrote:
> On 25/11/2020 11:46, yagyans...@gmail.com wrote:
> > The alert formation doesn't seem to be a problem here, because it
> > happens for different alerts randomly. Below is the alert for Exporter
> > being down for which it has
Cool, thanks for the quick help.
On Wed, Nov 25, 2020 at 1:18 PM Ben Kochie wrote:
> No, concurrency only affects how many queries are running at the same
> time.
>
> On Wed, Nov 25, 2020 at 8:45 AM Yagyansh S. Kumar <
> yagyanshsku...@gmail.com> wrote:
>
>> Than
I'll try and get a backtrace and post it here.
But still the question remains, is BBE is returning probe_success 0, why is
it doing only for 2.20.1 .
On Sat, 7 Nov, 2020, 11:33 pm Brian Candler, wrote:
> I don't think it's a false alert. If it's the rule you showed, then the
> only way you
Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe it's
borderline to the scrape interval.
>> That's interesting. Here are the top 20 scrape_duration_seconds maxed
for last 1 hour by instance. Close to 5 seconds. Can this lead to some
issue? But again the thing comes why
Yes, both the Prometheus instances are talking to the same BBE indeed.
Infact both have the exact same configuration file and are scraping the
exact same targets.
Here is the graph for the modified query. Fails visible for 2.20.1 but none
for 2.12.0.
2.12.0
[image: image.png]
2.20.1
[image:
That's a little strange. We can specify that the probe should fail if the
response body does not match a given pattern, but we cannot extract the
pattern that is returned by the URL. Quite a basic functionality that would
have been nice to have.
Anyways, thanks for the help!
On Sat, Aug 1, 2020
to 5 different
targets, all of which belong to 5 different components and all the 5
components have more than 1 target but I want the custom threshold to be
applied for only a single target from each component.
On Fri, Jul 3, 2020 at 12:02 AM Yagyansh S. Kumar
wrote:
> Hi Christian,
>
> Ac
Hi Christian,
Actually, I want to another if there is any better way to define the
threshold for my 5 new servers that belong to 5 different components. Is
writing 5 different recording rules with the same name, and different
instance and component labels only way to proceed here? Won't that be a
Thanks for the solution.
But creating 5 different recording rules for the same custom threshold
doesn't seem the best idea. It is good as a last resort I guess.
Any better suggestion to approach this?
On Thursday, June 25, 2020 at 1:06:02 AM UTC+5:30, sayf eddine Hammemi
wrote:
>
> Correct,
Hi. I need to export metrics from Cassandra DB, and I there have been mixed
suggestions to use JMX Exporter or Standalone Cassandra Exporter(Which are
many). Which one should be the correct way to go? Also, are there any known
issues with JMX Exporter, I mean does it hamper's Cassandra
Thanks a lot for pointing me in the correct direction.
On Wednesday, May 20, 2020 at 12:35:20 PM UTC+5:30, Brian Candler wrote:
>
> On Wednesday, 20 May 2020 05:48:28 UTC+1, Yagyansh S. Kumar wrote:
>>
>> Thanks for the response Brian.
>>
>> I have already enabled t
Thanks for the response Brian.
I have already enabled the NTP collector in all all my servers, but still
cannot see the *node_ntp_drift_seconds* metrics giving the output. Apart
from that, I have couple of questions here.
Firstly, why are we checking the target clock with Prometheus' server?
Hi. I have my own NTP server configured at x.x.x.x . Now, I want to check
if my 10 other servers are synchronized with my NTP server or not. I have
gone through a lot of threads and found different opinions with different
answers. Also, I guess node_ntp_drift_seconds is an old metrics and
But I did that intentionally to check. Before doing this change my job was:
- job_name: 'Ping-All-Servers'
metrics_path: /probe
params:
module: [icmp_prober]
file_sd_configs:
- files:
- /etc/blackbox/*Ping_Targets.yml*
Servers.yml
relabel_configs:
-
Hi. I am using Blackbox's ICMP module to check for whether my server is
Pingable or not. I have defined all the targets in a separate file.
Everything was working fine till yesterday, but from then even if I remove
any target from my target file, Prometheus does not take the updated file
and
Hi. I am monitoring my Health Check URLs using Blackbox, and I am trying to
attach the hostname of the server on which the Health Check is down. My
Health Checks are of the form - ServerIP:PORT/my/healthcheck/url (Eg.
x.x.x.x:8080/api/a1/healthcheck). I am extracting the ServerIP using regex
Thanks Brain.
>
> Can you give some more specific examples? What metric are you joining
> with - perhaps node_uname_info? >>
>
- alert: HighCpuLoadCrit
expr: (node_load15 > (2 * count without (cpu, mode)
(node_cpu_seconds_total{mode="system"}))) ** on(instance)
Hi. So, I am using IP:PORT as targets(I know its not ideal, I should have
only IPs, will switch to it soon) for all my node exporter jobs. I am
getting the hostnames of my servers in the alerts using the group_left and
joining them to my original alert query. Now, the problem is when alert of
For example:
> http://localhost:9115/probe?module=http_2xx=https://nonexistentbogusdomain.io=true
>
> If that doesn't help, you'll probably need to share more information about
> your configuration on the Prometheus and BBE side.
>
> On Thu, Apr 23, 2020 at 10:40 AM Yagyansh
Hi. I have couple of Service Health Checks on my AWS Instances. (Eg.
18.220.x.x:PORT/health). I am using Blackbox to monitor all my Service
Health Checks.
These AWS Instances Health Checks are being shown as DOWN by Blackbox even
though they are working fine. Also, when I check response code
ing sent anymore after a while.
>
> On Tue, Apr 21, 2020 at 12:08 PM Yagyansh S. Kumar > wrote:
>
>> I am sorry I forgot to mention that I am using v1 Alertmanager API to
>> capture this data. /api/v1/alerts
>>
>> On Tuesday, April 21, 2020 at 3:38:00 PM UTC+5:30, Ya
Sorry posted it in the wrong thread.
Thanks for your help.
On Tuesday, April 21, 2020 at 5:02:23 PM UTC+5:30, Brian Candler wrote:
>
> Not sure what you mean - time difference between what and what?
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus
Is there a significant time difference in query evaluation while querying
directly through API?
On Tuesday, April 21, 2020 at 3:51:56 PM UTC+5:30, Brian Candler wrote:
>
> On Tuesday, 21 April 2020 10:26:47 UTC+1, Yagyansh S. Kumar wrote:
>>
>> If I want to see the
Oh, yes.
Thanks a lot, Brain.
On Tuesday, April 21, 2020 at 3:51:56 PM UTC+5:30, Brian Candler wrote:
>
> On Tuesday, 21 April 2020 10:26:47 UTC+1, Yagyansh S. Kumar wrote:
>>
>> If I want to see the number of requests that increased on let say 17th
>>
21 Apr 2020 at 11:12, Yagyansh S. Kumar > wrote:
>
>> So, we can't rely on either Prometheus' internal ALERTS_FOR_STATE and
>> endsAt, StartsAt also.
>>
>> Then, what should we use to get the age of alerts?
>>
>
> I'd personally look at it on the relevant gra
So, we can't rely on either Prometheus' internal ALERTS_FOR_STATE and
endsAt, StartsAt also.
Then, what should we use to get the age of alerts?
On Tuesday, April 21, 2020 at 3:40:02 PM UTC+5:30, Brian Brazil wrote:
>
> On Tue, 21 Apr 2020 at 11:06, Rahul Hada >
> wrote:
>
>> We have configured
I am sorry I forgot to mention that I am using v1 Alertmanager API to
capture this data. /api/v1/alerts
On Tuesday, April 21, 2020 at 3:38:00 PM UTC+5:30, Yagyansh S. Kumar wrote:
>
> Hi. What and how are the timestamps gathered for startsAt and endsAt? Is
> there any docu
Hi. What and how are the timestamps gathered for startsAt and endsAt? Is
there any documentation for this?
>From what I have observed startsAt seems to be giving the correct time of
when the alert entered the "firing" state. But endsAt time seem to be
ambiguous to me, because one of my
One more query regarding this.
If I want to see the number of requests that increased on let say 17th
April. How to approach that?
On Tuesday, April 21, 2020 at 1:19:43 PM UTC+5:30, Brian Candler wrote:
>
> Those metrics like haproxy_frontend_http_requests_total are counters.
>
> If you want
Thanks a lot Brain, this works.
On Tuesday, April 21, 2020 at 1:19:43 PM UTC+5:30, Brian Candler wrote:
>
> Those metrics like haproxy_frontend_http_requests_total are counters.
>
> If you want see much they've increased in the last 24 hours, then you need
> to use this function
>
> HAProxy does not output historic data about past requests (nor would
> Prometheus be able to scrape it), so you'd have to scrape the HAProxy
> exporter for 24h, 2 days, etc., to get that amount of history into
> Prometheus. >>>
>
I think I didn't get this part properly. I am already
Hi. I am using HAPROXY for configuring my LB-Rules and using HAPROXY
Exporter to get the metrics. Now, majorly I want to keep track of the
Requests coming to my LB and the responses served by my LB that is
configured on the HAPROXY.
Now,
haproxy_frontend_http_requests_total
> I usually recommend doing this by Julius's other suggestion, the
> node_exporter textfile collector. This makes it easy to integrate the
> "ignore this interface" metric into your configuration management on the
> node.
>
> On Mon, Apr 20, 2020 at 5:13 PM Yagyansh
Sorry, I didn't notice, that I have forgot to share the configuration of
the recording rule. I thought I have pasted it.
Anyways, you could have pointed that our nicely too.
On Monday, April 20, 2020 at 3:34:41 PM UTC+5:30, Brian Candler wrote:
>
> Please think carefully about what you've just
on each
> machine's Node Exporter, if you have the info there.
>
> C) Hardcode the selection of those excepted devices directly into your
> alerting expression, with multiple "unless"-es. Probably a bit ugly :)
>
> On Mon, Apr 20, 2020 at 9:45 AM Yagyansh S. Kumar > wrot
Hi. I have configured an alert to get notified whenever any of my Network
Interfaces goes down. Now, on some servers some interfaces we have made
down intentionally and I want to exclude those interfaces for those
particular servers from the alert.
What is the best possible way to do this? I
Thanks Brain.
What other way do you suggest to get the accurate age of alerts?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Hi. I want to get the age of my alerts, and have been playing around with
ALERTS_FOR_STATE for the same. After converting the Epoch Time to Normal
Date-Time, the age that I am getting for the alerts seem to be incorrect
for some of the alerts. Also, I can see a lot of my alerts are having the
Adding to this, I noticed something very strange. I am seeing that
node_filesystem_device_error gives 1, but when I login into the servers,
everything seems to be fine with the NFS mount. Even the statfs call is
successful.
Is this a bug? If not, how to know the reason because of which
Hi. After a lot of discussion on this forum for monitoring NFS hang issues,
I am using node_filesystem_device_error to see if my NFS mount is hanging
or not.
Now, since node_filesystem_device_error is basically a statfs call, there
can be more than one reason for statfs call to fail. So, if
is in hung state, node_filesystem_device_error
should definitely inform you about the same.
On Sunday, March 15, 2020 at 3:56:19 AM UTC+5:30, Yagyansh S. Kumar wrote:
>
> Sure. Will absolutely do.
>
> On Sunday, March 15, 2020 at 3:30:59 AM UTC+5:30, Christian Hoffmann wrote:
>>
I know this cannot be called as a Bug, but I find it a little odd that you
cannot know the value that it dropped to in your alert once it has resolved.
On Saturday, April 18, 2020 at 7:56:47 PM UTC+5:30, Yagyansh S. Kumar wrote:
>
> Thanks a lot for the detailed explantion, Brain.
>
Thanks a lot for the detailed explantion, Brain.
I guess I need to monitor the resolved alerts a bit more closely and then
take a call.
On Saturday, April 18, 2020 at 3:16:56 PM UTC+5:30, Brian Candler wrote:
>
> I can see two possible issues here.
>
> Firstly, the value of the annotation you
Btw I am struggling with the resolved alerts. I have posted the details in
below mentioned thread. Would be really helpful if you can suggest
something.
Thread Link -
https://groups.google.com/forum/#!topic/prometheus-users/LLsPBIvLIME
--
You received this message because you are subscribed
; no HTTP status code at all for a failed check (like if the connection
> couldn't be established), but luckily the BBE still exposes the status code
> metric in that case, but with a value of 0, so the expression still works
> in that case.
>
> On Sat, Apr 18, 2020 at 9:07 AM Ya
Thanks a lot Julius.
Btw., you could change the "=~" to just "=" because that regex is just
doing a full string equality match anyway. >>
Thanks for pointing it out. I recently removed 2-3 jobs from that alert, so
forgot to remove the regex matcher.
>
>> --
>> You received this message
Thanks, Julian.
Btw., you could change the "=~" to just "=" because that regex is just
doing a full string equality match anyway. >>
Thanks for pointing it out. I recently removed 2-3 jobs from this alert and
forgot to remove the regex matcher.
>
> On Fri, Ap
Hi. I am using Blackbox exporter to monitor my Application's and LB's
HealthCheck URLs.
My alert for this looks like below:
- alert: ServiceHTTPChecks
expr: probe_success{job=~"blackbox_Service-HealthChecks"} == 0
for: 2m
labels:
severity: "CRITICAL"
annotations:
$labels.nodename }}* is more than 90%."
description: "Current Value = *{{ $value | humanize }}*"
identifier: "*Cluster:* `{{ $labels.cluster }}`, *node:* `{{
$labels.node }}` "
On Tuesday, April 14, 2020 at 4:41:43 PM UTC+5:30, Stuart Clark wrote:
>
> On 2020-04-14 12:
Hi. I am using Alertmanager version 0.16.0. The Resolved Alerts that I am
receiving are wrong. The Alertmanager fires the resolved alert as soon as
the value decreases even slightly i.e it does not wait for the value to get
less than the threshold. And this thing is happening for every alert.
FAN Speed, Network Interfaces metrics etc. also are required.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to prometheus-users+unsubscr...@googlegroups.com.
Hi. I want to monitor my SAN Switches and expect metrics such as Port
Utilization, Uptime, CPU Usage, Memory Usage etc. I have tried using SNMP,
and it gets me the network side of metrics perfectly but I need System
Metrics too(CPU, Memory etc.). Is there any exporter for this? If not can
Hi. In many of my servers, supervisord isn't running on the default port
and the port also varies from cluster to cluster. Is there a way we can
detect the port of the supervisor and then collect the metrics from that
custom URL:PORT?
Thanks!
--
You received this message because you are
Alas, only if migration from CentOS 6 to 7 was in my hand :P.
But thanks for the suggestion, will give process exporter a try.
On Wednesday, April 1, 2020 at 2:50:39 AM UTC+5:30, Christian Hoffmann
wrote:
>
> Hi,
>
> On 3/31/20 12:22 PM, Yagyansh S. Kumar wrote:
> > H
itelist="(apache2|ssh|rsyslog|nginx).service"
> but not sure how to do it?
>
> Again sorry to hijack your post and sorry can't add anything to help
>
> On Tuesday, March 31, 2020 at 6:22:17 AM UTC-4, Yagyansh S. Kumar wrote:
>>
>> Hi. We have systemd collector in
*1584698633*
What does this value represent.
On Wednesday, March 18, 2020 at 3:06:54 AM UTC+5:30, Julien Pivotto wrote:
>
> On 17 Mar 22:33, Christian Hoffmann wrote:
> > Hi,
> >
> > On 3/17/20 10:33 AM, Yagyansh S. Kumar wrote:
> > > Hi. I want to extract
Hi. I want to push the my alerts to a ticketing tool. The ticketing tool
has an API using which we can push our alerts to it.
Now, what I want to do is that as soon as a alert fires I want to push my
alert details to the tool by calling the ticketing tool's API.
Can anyone help and give some
Hi. I want to extract the age of the alerts(i.e from when the alert is
Critical or Warning or even Resolved). Is this possible?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from
Thanks a lot.
On Tuesday, March 17, 2020 at 1:46:42 PM UTC+5:30, Brian Candler wrote:
>
> On Tuesday, 17 March 2020 03:33:40 UTC, Yagyansh S. Kumar wrote:
>>
>> And that would mean making 10 different jobs to probe those 10 URLs.
>> Right?
>>
>>
> No:
And that would mean making 10 different jobs to probe those 10 URLs. Right?
On Monday, March 16, 2020 at 8:29:24 PM UTC+5:30, Brian Candler wrote:
>
> You'll need to create 10 different modules - there's no way to expand
> parameters into the module settings.
>
> You can use a script to generate
fig of prom and blackbox on
> how you have done basic auth?
>
> Thanks
> Eswar
>
> On Monday, March 16, 2020 at 3:37:59 PM UTC+1, Yagyansh S. Kumar wrote:
>>
>> Hi. I have around 10 URLs that I want to monitor. All are under Basic
>> Auth and have different a
Hi. I have around 10 URLs that I want to monitor. All are under Basic Auth
and have different auth credentials. Is it all possible to probe them using
a single blackbox module? Or will I have to create 10 different modules,
which seems a very tedious task.
Thanks!
--
You received this
Thanks a lot, Christian. Will try them out and report back.
Also, according to you will the Step 3 add any significant overhead? I mean
will it cause any kind of slowness?
On Sunday, March 15, 2020 at 5:26:47 PM UTC+5:30, Christian Hoffmann wrote:
>
> Hi,
>
> On 3/15/20 10:07 AM
Hi. I want to add the dashboard link in the alert of that particular
service. That dashboard takes the server IP and hostname as input. From the
instance label, I want to remove the port number and pass it as input to be
dashboard.
Configured Alert:
- alert: OutOfDiskSpace-Crit
expr:
Sure. Will absolutely do.
On Sunday, March 15, 2020 at 3:30:59 AM UTC+5:30, Christian Hoffmann wrote:
>
> Hi,
>
> On 3/14/20 10:35 PM, Yagyansh S. Kumar wrote:
> > Yes, I did experiment with node_filesystem_device_error earlier based on
> > Ben's suggest
,
I'll definitely give node_filesystem_device_error another try and see if I
can come up with something interesting.
Thanks a lot for your help. Cheers!
On Sunday, March 15, 2020 at 2:49:01 AM UTC+5:30, Christian Hoffmann wrote:
>
> On 3/14/20 10:01 PM, Yagyansh S. Kumar wrote:
> > Als
?
On Saturday, March 14, 2020 at 10:06:38 PM UTC+5:30, Christian Hoffmann
wrote:
>
> On 3/14/20 5:06 PM, Yagyansh S. Kumar wrote:
> > Can you explain in a little detail please?
> I'll try to walk through your example in several steps:
>
> ## Step 1
> Your initial expression was
Awesome explanation. This helps a lot. Thanks, I appreciate it.
On Saturday, March 14, 2020 at 10:06:38 PM UTC+5:30, Christian Hoffmann
wrote:
>
> On 3/14/20 5:06 PM, Yagyansh S. Kumar wrote:
> > Can you explain in a little detail please?
> I'll try to walk through your exa
Can you explain in a little detail please?
On Saturday, March 14, 2020 at 9:26:39 PM UTC+5:30, Christian Hoffmann
wrote:
>
> Hi,
>
> On 3/14/20 4:32 PM, Yagyansh S. Kumar wrote:
> > Hi. In my prometheus.yml file all the targets necessarily have 2 labels
> >
No, it does not work on reload too. I have to manually select the "All"
option in the cluster in the target dashboard.
This problem remains if I swap the source and target dashboards.
Ideally, "All" is the value that should be passed to the target dashboard
and not {value1,value2,...}. One more
Hi. In my prometheus.yml file all the targets necessarily have 2 labels viz
"cluster" and "node".
I have configured an alert for CPU Load with a dynamic threshold of "Number
of Cores of the server".
Configured alert:
- alert: HighCpuLoad
expr: (node_load15 > count without (cpu, mode)
Thanks for the help. Will try this.
On Saturday, March 14, 2020 at 5:07:19 PM UTC+5:30, Christian Hoffmann
wrote:
>
> Hi,
>
> On 3/13/20 1:55 PM, Yagyansh S. Kumar wrote:
> > Hi. I am using Number of Cores of a server as the threshold for CPU
> > Load. It is working
-> Um, how
exactly do I conclude this?
On Saturday, March 14, 2020 at 5:21:20 PM UTC+5:30, Christian Hoffmann
wrote:
>
> On 3/13/20 9:32 PM, Yagyansh S. Kumar wrote:
> > Hi. In one of the dashboards(Say D-1), I have created a variable called
> > cluster and it has "
Hi. In one of the dashboards(Say D-1), I have created a variable called
cluster and it has "All" option enabled. I have the same variable in
another dashboard(Say D-2) and there are links of D-2 in D-1 at different
places according to the value of cluster variable. When the value of
cluster is
Hi. I am using Number of Cores of a server as the threshold for CPU Load.
It is working fine but I want to print the number of cores also in the
alert.
Configured Alert:
- alert: HighCpuLoad
expr: (node_load15 > count without (cpu, mode)
(node_cpu_seconds_total{mode="system"})) *
Is there any way to query the alertmanager and get the Silenced Alerts.
If there isn't any direct method, is there any workaround for this?
Thanks in advance.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group
Oh man! Can't believe I missed the tilde.
Thanks Brain.
On Thursday, March 12, 2020 at 2:25:29 PM UTC+5:30, Brian Candler wrote:
>
> If you are using multi-select in Grafana, you have to write your label
> match as *cluster=~"$cluster"*
>
--
You received this message because you are subscribed
Hi. I have made a variable "cluster" in Grafana that contains all my
clusters(ADS,TV etc.) present in my Infra.
Now, I want the critical alert count for every service(CPU, Memory, Disk)
clusterwise which I am getting successfully using the alertmanager query -
*alertname="CPULOAD",
wrote:
>
> On 11 Mar 10:59, Yagyansh S. Kumar wrote:
> > I mean it doesn't work in giving me the actual Load value.
> > The expression you mentioned will give be perfect in defining the
> threshold
> > but the value that this expression will give will be (Actual
Oh sorry, my bad! Yes, it does. Was comparing wrong things.
Thanks a lot!
On Wednesday, March 11, 2020 at 11:31:49 PM UTC+5:30, Julien Pivotto wrote:
>
> On 11 Mar 10:59, Yagyansh S. Kumar wrote:
> > I mean it doesn't work in giving me the actual Load value.
> > The expres
with
threshold still being the Number of Cores.
On Wednesday, March 11, 2020 at 11:23:09 PM UTC+5:30, Yagyansh S. Kumar
wrote:
>
> Thanks for the response Julien, but I have already tried the query that
> you have mentioned, but it doesn't work.
>
> On Wednesday, March 11, 2020 at
Thanks for the response Julien, but I have already tried the query that you
have mentioned, but it doesn't work.
On Wednesday, March 11, 2020 at 11:21:35 PM UTC+5:30, Julien Pivotto wrote:
>
> On 11 Mar 10:49, Yagyansh S. Kumar wrote:
> > I have one more small query.
:30, Yagyansh S. Kumar
wrote:
>
> Maybe I'll refine the threshold even further but for now this works.
> Thanks a lot for help.
>
> On Wednesday, March 11, 2020 at 10:47:14 PM UTC+5:30, Harald Koch wrote:
>>
>>
>>
>> On Wed, Mar 11, 2020, at 13:00, Ya
Maybe I'll refine the threshold even further but for now this works. Thanks
a lot for help.
On Wednesday, March 11, 2020 at 10:47:14 PM UTC+5:30, Harald Koch wrote:
>
>
>
> On Wed, Mar 11, 2020, at 13:00, Yagyansh S. Kumar wrote:
>
> Hi. I have configured alert for CPU Load for
Hi. I have configured alert for CPU Load for my servers and my current
threshold is 8 for warning and 10 for critical.
I want to make this threshold dynamic i.e I want the critical alert when
the CPU Load becomes greater than the number of CPU Cores of the machine.
Eg. For a server with 8 CPU
Hi. I monitoring around 2500+ servers. I am using node_exporter to collect
system metrics and blackbox for Service Health Checks that are configured
on those servers. Now, all the servers do not have the same health check
and port. Some don't even have any health check or port configured.
I
Hi. I have a system where multiple NFS' are mounted across servers. I want
the total Input and Output operations of the NFS mount based on the
mountpoint(Eg. I have a NFS mounted at /data which is mounted on 100
servers. Now, I want I/O of mountpoint /data.) I have checked the stats
scraped by
uments on your node exporter.
>
> https://prometheus.io/docs/concepts/data_model/
>
> https://github.com/prometheus/node_exporter/blob/master/README.md
>
> On Sat, Mar 7, 2020, 11:01 AM Yagyansh S. Kumar > wrote:
>
>> Hi. I am just now able to get my head around t
Hi. I am just now able to get my head around the working of textfile
collector. Can someone explain or provide a good link to understand its
working?
Thanks!
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group
Cool. Thank you.
Just started getting comfortable with Prometheus, so getting a lot of
questions in mind. Thanks for the quick responses! :)
On Friday, March 6, 2020 at 1:29:05 AM UTC+5:30, Brian Candler wrote:
>
> On Thursday, 5 March 2020 19:23:51 UTC, Yagyansh S. Kumar wrote:
>>
but still is there something that can be done?
On Friday, March 6, 2020 at 12:10:57 AM UTC+5:30, Yagyansh S. Kumar wrote:
>
> Cool, thanks a ton.
>
> Just one doubt. I have tried and have been using the join method that you
> mentioned, in grafana for getting the hostnames as legen
Cool, thanks a ton.
Just one doubt. I have tried and have been using the join method that you
mentioned, in grafana for getting the hostnames as legends but how do I
extract the hostname from the expression in alertmanager? I mean we have
the expression but I am stuck at getting the nodename
1 - 100 of 120 matches
Mail list logo