On Sunday, 8 November 2020 12:10:54 UTC, Yagyansh S. Kumar wrote:
>
> I'll try and get a backtrace and post it here.
>
> But still the question remains, is BBE is returning probe_success 0, why
> is it doing only for 2.20.1 🙄.
>
>
It could be that 2.12 is missing the data point (scrape) entirely.
I'll try and get a backtrace and post it here.
But still the question remains, is BBE is returning probe_success 0, why is
it doing only for 2.20.1 🙄.
On Sat, 7 Nov, 2020, 11:33 pm Brian Candler, wrote:
> I don't think it's a false alert. If it's the rule you showed, then the
> only way you
I don't think it's a false alert. If it's the rule you showed, then the
only way you can get an alert is if the metric probe_success has value
zero. You should try to understand *why* BBE is returning zero; if
necessary use tcpdump or wireshark to capture the HTTP traffic to and from
it.
But
On Saturday, 7 November 2020 13:35:47 UTC, Yagyansh S. Kumar wrote:
>
> Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe
> it's borderline to the scrape interval.
> >> That's interesting. Here are the top 20 scrape_duration_seconds maxed
> for last 1 hour by instance. Close
Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe it's
borderline to the scrape interval.
>> That's interesting. Here are the top 20 scrape_duration_seconds maxed
for last 1 hour by instance. Close to 5 seconds. Can this lead to some
issue? But again the thing comes why no
Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe it's
borderline to the scrape interval.
What does min_over_time(up{job="Ping-All-Servers"}[5m]) show? In other
words, is it the scrape to BBE which is failing, or the BBE probe? (I think
the latter).
Is there a different n
Yes, both the Prometheus instances are talking to the same BBE indeed.
Infact both have the exact same configuration file and are scraping the
exact same targets.
Here is the graph for the modified query. Fails visible for 2.20.1 but none
for 2.12.0.
2.12.0
[image: image.png]
2.20.1
[image: imag
You won't necessarily see all the failures on that graph. With a 5-second
scrape interval, a 6 hour window contains 4,320 scrapes - more than the
number of points fetched. Many of the points will be skipped over.
I suggest you graph this instead:
min_over_time(probe_success[5m])
(Otherwise,
On Saturday, 7 November 2020 08:49:15 UTC, yagyans...@gmail.com wrote:
>
> My Blackbox exporter is already running with Debug Log Mode and still, I
> don't see and probe failed logs for that period.
>
But is this the same blackbox exporter which is also showing panics in its
logs?
https://groups
The promQL queryprobe_success{job=~"Ping-All-Servers"} == 0
is a filter. It returns the set of timeseries where the job label matches
"Ping-All-Servers" *and* the value is zero. It cannot return a non-empty
set of results unless those conditions are met.
What's your rule evaluation interv
Hi Brian,
My Blackbox exporter is already running with Debug Log Mode and still, I
don't see and probe failed logs for that period.
Also, I have ran the query for some of the instances that I saw in PENDING
state, but I do not see any failures there also, probe_success is 1 for
them constantly
Go into the Prometheus query browser (front page in the web interface,
normally port 9090), and enter the query:
probe_success{job=~"Ping-All-Servers"}
and switch to graph mode. Is the line going up and down? Then probes are
failing.
If you want to see logs of these failures, then on the bla
Prometheus Version - 2.20.1
On Saturday, November 7, 2020 at 1:46:31 PM UTC+5:30 yagyans...@gmail.com
wrote:
>
> Hi. I am using Blackbox Exporter v 0.18.0 for generating Host Down Alerts.
> Below is the configured rule.
> - alert: HostDown
> expr: probe_success{job=~"Ping-All-Servers"} ==
13 matches
Mail list logo