Re: [prometheus-users] Unable to access localhost from pushgateway pod on openshift

2020-04-28 Thread Stuart Clark
When you are running an application inside a container "localhost" refers to the container, not the underlying host. You will need to use a different method to reference other pods, such as a service DNS name. On 29 April 2020 07:09:56 BST, Nidhi Sharma wrote: >Hi, >Do i need to make any confi

[prometheus-users] Unable to access localhost from pushgateway pod on openshift

2020-04-28 Thread Nidhi Sharma
Hi, Do i need to make any config changes so that I can access localhost? Please help -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubs

Re: [prometheus-users] Re: rate() doesn't work

2020-04-28 Thread Isabel Noronha
sudo sysctl fs.inotify.max_user_watches=1048576 I ran the above command and it started working fine. But now my only concern is I'll have 2k(containers on each host) * 20(host). I have done relabeling already. Would this problem reproduce if I increase no. of hosts? Currently it is 1 host with 2

Re: [prometheus-users] Re: rate() doesn't work

2020-04-28 Thread Isabel Noronha
Yes I changed it 40 s. I had increased it since in one host 2k containers were running and then there was this error "context deadline exceeded error". So I thought scrape_interval isn't sufficient to scrape all the metrics. On Wednesday, April 29, 2020 at 2:38:58 AM UTC+5:30, Stuart Clark wro

[prometheus-users] How to write not condition in alert manager route matches?

2020-04-28 Thread Radha R4
How to write not condition in alert manager route matches? I want to execute this condition only when severity is not critical.. routes - match_re: severity: ^(?!critical) receiver: teams-channel-webhook-high-priority I tried the above regex condition .But it didnt work.

Re: [prometheus-users] Re: rate() doesn't work

2020-04-28 Thread Stuart Clark
On 28/04/2020 20:31, Isabel Noronha wrote: scrape_interval is 5m I changed the scrape_interval today forgot to change it in grafana variable. Thank you. You are likely to have issues with a scrape interval that long. Due to staleness the maximum scrape interval is about 2 minutes, so you'd

[prometheus-users] Re: rate() doesn't work

2020-04-28 Thread Isabel Noronha
scrape_interval is 5m I changed the scrape_interval today forgot to change it in grafana variable. Thank you. On Wednesday, April 29, 2020 at 12:54:15 AM UTC+5:30, Brian Candler wrote: > > You haven't given enough details about what your metrics look like or how > often you are scraping. > > How

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
You haven't shown real examples of what these metrics look like. I'm guessing you're not talking about metric names, but label values. If you want to filter out metrics which have particular labels or label patterns, then you need to use metric_relabeling

[prometheus-users] Re: rate() doesn't work

2020-04-28 Thread Brian Candler
You haven't given enough details about what your metrics look like or how often you are scraping. However, I can tell you that rate(foo[1m]) only looks at a 1 minute window of the input. If you are only scraping at 1 minute intervals, that means there will only be one data point in that window

[prometheus-users] Re: Dashboard: abnormal display of quantity

2020-04-28 Thread Brian Candler
You still haven't shown your query. I will guess it's something like: irate(ifHCInOctets[5m]) If you view this in PromQL browser, you should get a value in bytes per second. If you want bits per second, then: irate(ifHCInOctets[5m])*8 To visualize this correctly in Grafana, all you have to d

[prometheus-users] Python Client library: Metric with same name but different label

2020-04-28 Thread Vishnu B
I am writing a node_exporter text_file export using python client library. I have metrics with the same metric name but different label values like below TPL{dc="tlv1",instance="core2.tlv1",job="cisco_ipsla",link_id="PL2_31.0",region="sjc2"} 19.0 TPL{dc="ash1",instance="core1.ash1",job="cisco_

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread piyush sharma
Thanks for all your help. I really appreciate One more doubt My application metrics are based on locales As in ru-RU , en-GB, es-US. Actually there is some issue with application vai metrics are coming twice Like ru-RU = 10 ru-ru = 10 I want to filter out values only having all alphabets in low

[prometheus-users] rate() doesn't work

2020-04-28 Thread Isabel Noronha
Query1.sum by(container_label_com_docker_swarm_service_name)(container_network_transmit_bytes_total{{image!="",container_label_com_docker_swarm_service_name="$service"}) Query2.sort_desc(sum by (container_label_com_docker_swarm_service_name) (rate(container_network_transmit_bytes_total{image!=""

[prometheus-users] Re: Dashboard: abnormal display of quantity

2020-04-28 Thread Brian Candler
1. You'll need to say what you mean by "abnormal". Those graphs just look like perfectly normal graphs to me, except the values on 2.GIF are very large. The third image is so tiny it makes no sense. 2. You'll need to show what PromQL queries you are sending. 3. I am guessing this is Grafana y

[prometheus-users] Re: Prometheus returns 502 error

2020-04-28 Thread Dmitry
I fix this problem knocking the config to general view. On Monday, April 27, 2020 at 4:40:53 PM UTC+3, Dmitry wrote: > > Hello everybody! > When I try to use snmp_exporter > with a non-standard > configuration with my standard Prometheus, the config

[prometheus-users] Re: Prometheus returns 502 error

2020-04-28 Thread Dmitry
On Monday, April 27, 2020 at 4:40:53 PM UTC+3, Dmitry wrote: > > Hello everybody! > When I try to use snmp_exporter > with a non-standard > configuration with my standard Prometheus, the config is not loaded and I > get 502 error. > The project de

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
On Tuesday, 28 April 2020 11:03:45 UTC+1, piyush sharma wrote: > > Is there any way. I can set no data condition as alerting . > You can use "or" to give a default value, e.g. ( expr ) or (up * 99) (assuming that 'expr' and 'up' have the same set of labels - if not, then you can use gro

Re: [prometheus-users] Re: Help understanding graph of a PromQL query

2020-04-28 Thread Rajat Varyani
Thanks Brian. You are correct. We are using kube-state-metrics to export. The behavior of the exporter is to export unless and until the object is deleted from the cluster. As we were tracking jobs in this case, they are not cleaned up automatically. Hence the exporter exports the metric even aft

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread piyush sharma
Hey , You are truly a rockstar. Yeah infact the data was not coming. Is there any way. I can set no data condition as alerting . On Tue, 28 Apr, 2020, 3:22 pm Brian Candler, wrote: > That's a complicated expression. > > I suggest you paste the whole expression into the promql browser (i.e. >

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
That's a complicated expression. I suggest you paste the whole expression into the promql browser (i.e. prometheus port 9090) and look at the graph. If you see gaps in the graph, that's where the expression does not have any value, and that's where the alert is getting resolved. Note that whi

[prometheus-users] Re: [Beginner] Build query for the http request.

2020-04-28 Thread Brian Candler
Ah right. Because you had "probe" in the metric name I thought you were using blackbox_exporter. That isn't going to help you here, because each probe only make one request. To make 500 requests in 5 minutes, you would need to be scraping nearly two times per second! It sounds like rather wh

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread piyush sharma
Dear Brain, Thanks for the response, here is one of the alert that is causing this behaviour apiVersion: v1 data: alerting_rules.yml: | groups: - name: k8s.rules rules: - alert: Health down Alert annotations: description: Attention !!! Health of dict-servic

[prometheus-users] Re: [Beginner] Build query for the http request.

2020-04-28 Thread Sachin Maharana
Thanks for your response. I gained some idea to how to approach such querys. By 500 i meant the number of requests and not the error code(my wrong choice of wording did impy that ),. So if i want to alert on if 1000 requests takes more than avg of 3 seconds within a 5 min interval, .If i take h

[prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread Brian Candler
On Tuesday, 28 April 2020 08:56:21 UTC+1, piyush sharma wrote: > > I am badly stuck in a problem . > One main thing is that .. alert manager sends resolve notification on its > own but the alert is still active. > I want to disable this feature. I want "resolved " alert to be sent only > when ale

[prometheus-users] Re: [Beginner] Build query for the http request.

2020-04-28 Thread Brian Candler
Do you mean requests with result status code 500? This is a bit tricky. First thing you have to be careful of is that "probe_http_duration_seconds" is not the total, it's broken down into phases, as you can see if you try the exporter with curl: $ *curl 'localhost:9115/probe?module=http_2xx_

[prometheus-users] Alert manager looping in firing -> resolved -> firing

2020-04-28 Thread piyush sharma
Hi All , I am badly stuck in a problem . One main thing is that .. alert manager sends resolve notification on its own but the alert is still active. I want to disable this feature. I want "resolved " alert to be sent only when alert is really resolved. Below is my alert manager configuration

[prometheus-users] [Beginner] Build query for the http request.

2020-04-28 Thread Sachin Maharana
I am instrumenting my project for prometheus and wanted to query if 500 requests takes more than an average of 3 sec within 5m interval. I did have this avg_over_time(probe_http_duration_seconds[5m]) > 3 but not sure how to query for 500 requets. Any hint would be helpful. -- You received this