Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-08 Thread Brian Candler
(Aside: this thread has already taken a large chunk of this group's bandwidth, so this will be my last post on it) I don't see missing data points - every point has a value. I do see dips in the graphs. You have chosen to graph: sum by (foo) (X) / sum by (foo) (Y) You haven't described

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread Brian Candler
On Thursday, 7 May 2020 17:54:47 UTC+1, piyush sharma wrote: > > Hey > > I have a query like this > > if sum by (locale) ( expression1)/sum by (locale) ( expression2) >=0 > > This gives me no data points and hence a broken graph > > Looks like a reasonable expression (without the "if" on the

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread piyush sharma
Hey I have a query like this if sum by (locale) ( expression1)/sum by (locale) ( expression2) >=0 This gives me no data points and hence a broken graph sum_over_time did not work in this case as it does not take the "by" option Any way I can re write this ... can I make it avg of sum by

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread Brian Candler
The first matching route wins. If you want a matched route to continue onto subsequent matches, add "continue: true". You would need to set this on both of your first two routes, if you want those alerts to go to alertnow as well as to slack. The documentation you need is here:

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread piyush sharma
Hello , I am having an issue but nothing related to this comes up in logs Below is my alertmanager config alertmanager.yml: global: resolve_timeout: 12h receivers: - name: slack-production slack_configs: - api_url:

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread Brian Candler
I don't know an easy way. I guess you can do it using custom templates, since they are passed the list of firing and resolved alerts: https://prometheus.io/docs/alerting/notifications/ -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread piyush sharma
So is there any way out so that while alerting I get the value and when I get the resolved message ,,, i get only the message and not the value ? On Thu, May 7, 2020 at 6:18 PM Brian Candler wrote: > P.S. If you change your annotation from "Current value is X" to "Most > recent triggering value

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread Brian Candler
P.S. If you change your annotation from "Current value is X" to "Most recent triggering value is X", then the resolved message may make more sense. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread Brian Candler
On Thursday, 7 May 2020 13:37:43 UTC+1, piyush sharma wrote: > > > Actually when there is an alerting situation , I get an alert on slack > with the alerting value . > But after sometime when alert is resolved , I still get the old value > which is below the threshold ( ideally should be in

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread piyush sharma
Thanks for such a comprehensive answer to the query :) I have one more doubt Actually when there is an alerting situation , I get an alert on slack with the alerting value . But after sometime when alert is resolved , I still get the old value which is below the threshold ( ideally should be in

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread Brian Candler
P.S. if you want to get "an average over 5 minutes" you need to use a range vector, which is a collection of metrics with all their values over a range of time: then you can do avg_over_time( ... range vector ...) You can get a range vector directly from an individual metric: foo[5m] Or you

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread Brian Candler
Firstly, comparison operators don't work the way you imagine. They are more like filters. The expression "foo" is a vector of zero or more timeseries all with the metric name "foo". So for

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-07 Thread piyush sharma
Hey Guys , I have a doubt on how the result of an alert condition is evaluated. Below is my configuration for prometheus *evaluation_interval: 1m* * scrape_interval: 1m* Now my query is as below avg(metric_first_asr{locale=~"en-gb"}) by (locale) >= 80 AND

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-02 Thread Brian Candler
Why not try it in the PromQL expression browser built in to prometheus (in the prometheus web interface at port 9090) -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-02 Thread piyush sharma
Hello , I want to set alert with in a given range of values Just wanted to check if critical if health_value > 0 < 80 is a valid statement On Fri, 1 May, 2020, 8:15 pm Stuart Clark, wrote: > On 2020-05-01 14:57, piyush sharma wrote: > > Hello All, > > > > Is there any way we can send SMS

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-01 Thread Stuart Clark
On 2020-05-01 14:57, piyush sharma wrote: Hello All, Is there any way we can send SMS notification but free of cost for alerts using alertmanager. ? The cost depends on whatever service/system you are using for SMS messages. Alertmanager does not send SMS messages, it just sends requests

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-05-01 Thread piyush sharma
Hello All, Is there any way we can send SMS notification but free of cost for alerts using alertmanager. ? On Fri, 1 May, 2020, 12:37 am Brian Candler, wrote: > That's the maximum. > > Prometheus web interface (9090) shows them firing? What about alertmanager > web interface (9093)?

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-30 Thread Brian Candler
That's the maximum. Prometheus web interface (9090) shows them firing? What about alertmanager web interface (9093)? Otherwise check your routing rules. Maybe run tcpdump to see if it's trying to connect to slack. -- You received this message because you are subscribed to the Google Groups

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-30 Thread piyush sharma
Hello. I am not receiving any alerts in slack channel Nor I am getting any thing related to slack in the alert manager logs. Though alerts are firing. What leave of error logging is required for this. My current log level is debug. Regards Piyush On Thu, 30 Apr, 2020, 12:53 pm Brian Candler,

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-30 Thread Brian Candler
You put a reverse proxy in front, like apache or nginx. Same for prometheus itself. Same for adding HTTPS. If you want to proxy them on a particular path, like /alertmanager or /prometheus, then there are command-line flags you can set: --web.external-url=https://mon.example.net/prometheus

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-30 Thread piyush sharma
Thanks a lot for your guidance. Is there any way , we can create admin user and password for alertmanager ? On Wed, Apr 29, 2020 at 9:06 PM Brian Candler wrote: > Sorry, I got this wrong initially and corrected it in point (4) in a reply > to myself. > > -- > You received this message because

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread Brian Candler
Sorry, I got this wrong initially and corrected it in point (4) in a reply to myself. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread piyush sharma
Now i am getting the below error level=error ts=2020-04-29T14:23:14.189Z caller=main.go:740 err="error loading config from \"/etc/config-shared/prometheus.yaml\": couldn't load configuration (--config.file=\"/etc/config-shared/prometheus.yaml\"): parsing YAML file

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread Brian Candler
(4) with the labeldrop action, the regex matches against each of the label names (not values). So I think what you want is (untested): - action: labeldrop regex: replica -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread Brian Candler
(1) labeldrop, not drop. (2) source_labels is a list: source_labels: [replica] (3) as per the documentation , the alert_relabel_configs goes under the "alerting" section, as a sibling to "alertmanagers" alerting:

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread piyush sharma
Hi , Just want you to verify my configuration These are my external labels that are defined alerts: | {} prometheus.yaml.tmpl: | global: evaluation_interval: 1m external_labels: region: EastUS replica: $(POD_NAME) tier: stg I get alerts like

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread Brian Candler
You remove it only from alerts, using alert_relabel_configs . The forum link I posted before has a working example config. -- You received this message because you are subscribed to the Google

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread piyush sharma
Dear Brian, Thanks for your valuable advise. I have one doubt about the functionality of labeldrop. "replica" is my global label and requires to go with each and every metric ( Its a pre requisite for working of thanos) Now my question to you is : When label drop is applied , will it stop

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread Brian Candler
I don't use alertmanager clustering myself, but if you search this group for "alertmanager duplicates" or "alertmanager alert_relabel_configs" you'll find the answer. Example: https://groups.google.com/d/topic/prometheus-users/S9Xmg8209xE/discussion As I understand it, it's your responsibility

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-29 Thread piyush sharma
Hello again, Sorry for so many queries but i am a newbie to prometheus . I am running alertmanager in cluster. Problem is both alert managers are sending alerts and due to this duplication is happening. Below is my configuration Cluster-peer-timeout = 25s global: resolve_timeout: 12h

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
You haven't shown real examples of what these metrics look like. I'm guessing you're not talking about metric names, but label values. If you want to filter out metrics which have particular labels or label patterns, then you need to use metric_relabeling

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread piyush sharma
Thanks for all your help. I really appreciate One more doubt My application metrics are based on locales As in ru-RU , en-GB, es-US. Actually there is some issue with application vai metrics are coming twice Like ru-RU = 10 ru-ru = 10 I want to filter out values only having all alphabets in

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
On Tuesday, 28 April 2020 11:03:45 UTC+1, piyush sharma wrote: > > Is there any way. I can set no data condition as alerting . > You can use "or" to give a default value, e.g. ( expr ) or (up * 99) (assuming that 'expr' and 'up' have the same set of labels - if not, then you can use

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread piyush sharma
Hey , You are truly a rockstar. Yeah infact the data was not coming. Is there any way. I can set no data condition as alerting . On Tue, 28 Apr, 2020, 3:22 pm Brian Candler, wrote: > That's a complicated expression. > > I suggest you paste the whole expression into the promql browser (i.e.

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
That's a complicated expression. I suggest you paste the whole expression into the promql browser (i.e. prometheus port 9090) and look at the graph. If you see gaps in the graph, that's where the expression does not have any value, and that's where the alert is getting resolved. Note that

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread piyush sharma
Dear Brain, Thanks for the response, here is one of the alert that is causing this behaviour apiVersion: v1 data: alerting_rules.yml: | groups: - name: k8s.rules rules: - alert: Health down Alert annotations: description: Attention !!! Health of

[prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
On Tuesday, 28 April 2020 08:56:21 UTC+1, piyush sharma wrote: > > I am badly stuck in a problem . > One main thing is that .. alert manager sends resolve notification on its > own but the alert is still active. > I want to disable this feature. I want "resolved " alert to be sent only > when