Re: [prometheus-users] prometheus-snmp-generator error (Debian 10)

2020-09-01 Thread Mario Pranjic
Hi, To inform about Synology support case opened and MIB status. They are now aware of some issues in MIB definitions. I did mention a patch to fix one of those MIB files, but, frankly, I didn't get the impression that will be a drive for them to get more involved into the matter. >From their

[prometheus-users] Re: How to do relabelling in Prometheus operator?

2020-09-01 Thread radhamani...@gmail.com
Should I add it under additionalScrapeConfigs: in the values file? as given here https://github.com/helm/charts/blob/master/stable/prometheus-operator/values.yaml

[prometheus-users] How to do relabelling in Prometheus operator?

2020-09-01 Thread radhamani...@gmail.com
I want to do relabelling of some pod labels.But I am not sure which configuration I need to update in Prometheus Operator. Should I update the below config? alertmanager: config: Please advice.. -- You received this message because you are subscribed to the Google Groups "Prometheus

Re: [prometheus-users] Include hostname in the alert summary message

2020-09-01 Thread Zhang Zhao
Thank you for the explanation. > On Sep 1, 2020, at 1:50 PM, Brian Candler wrote: > > On Tuesday, 1 September 2020 21:42:39 UTC+1, Zhang Zhao wrote: > What I needed is to display the “hostname” in the summary so that I can > extract the hostname on ServiceNow side. Is that possible? > > >

Re: [prometheus-users] compression in prometheus

2020-09-01 Thread Christian Hoffmann
Hi, On 9/1/20 2:50 PM, Rodolphe Ghio wrote: > I am curently doing an intership and my tutor asked my to do an > algorithm to compress prometheus data, what do you think about that is > it possible ? I think there are lots of ressources regarding Prometheus' encoding which is already supposed to

Re: [prometheus-users] Include hostname in the alert summary message

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 21:42:39 UTC+1, Zhang Zhao wrote: > > What I needed is to display the “hostname” in the summary so that I can > extract the hostname on ServiceNow side. Is that possible? > > The group_left(x,y,z) means that the result gains labels x,y,z from the RHS expression. So

Re: [prometheus-users] Include hostname in the alert summary message

2020-09-01 Thread Zhang Zhao
What I needed is to display the “hostname” in the summary so that I can extract the hostname on ServiceNow side. Is that possible? - alert: HostOutOfMemory annotations: message: | Node memory is filling up (< 10% left) VALUE = {{ $value }} summary:

Re: [prometheus-users] Re: Efficient way to query non-active time series's last value

2020-09-01 Thread Ben Kochie
Prometheus attempts to use gzip http compression by default, but as you say, your exporter is local. Your 400k samples per scrape is pretty far out of bounds for a normal setup. Prometheus scales by scraping many small requests in parallel. Typically I recommend 50k samples per scrape is an

Re: [prometheus-users] Include hostname in the alert summary message

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 21:02:26 UTC+1, Zhang Zhao wrote: > > I added the group_left below as highlighted. However, it didn’t work as > expected in the output. Any advice where was wrong? > summary: Host out of memory (instance {{ $labels.instance }} > *group_left(nodename)

Re: [prometheus-users] Include hostname in the alert summary message

2020-09-01 Thread Zhang Zhao
Brian, I added the group_left below as highlighted. However, it didn’t work as expected in the output. Any advice where was wrong? summary: Host out of memory (instance {{ $labels.instance }} group_left(nodename) node_uname_info{job="node-exporter-vm”}) Output:

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Brian Candler
max() sounds like a reasonable approach. If the values are the same, you get the value. If one is higher than the other, choose the more pessimistic one. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Gergely Brautigam
They are basically duplicates. Two instances of the same service report on the same metric twice from a different pod with a different id. But it's literally the same metric. One of those should be ignored I don't necessarily know how and I can't find any good write-ups to follow of a

[prometheus-users] Avg of count of a metric.

2020-09-01 Thread vineka s
Hi , Is there any way we can get the avg of the count of a metric for a period of time.? Thanks! -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to

[prometheus-users] Re: Prometheus not getting metrics from cadvisor

2020-09-01 Thread Max Furman
Interesting! This comment helped me a lot. Under the new assumption that the address was wrong, I searched where cadvisor metrics could be scraped for prometheus in GKE. I found a blog post with an example kubernetes-cadvisor config that worked for me.

[prometheus-users] Re: Metrics reporting scraping timeout and failures

2020-09-01 Thread Brian Candler
https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series The metric "up" reports on scrape failures (including timeouts). The metric "scrape_duration_seconds" lets you monitor the scrape duration, so you can see if it's getting close to the time limit.

[prometheus-users] Re: Prometheus not getting metrics from cadvisor

2020-09-01 Thread Brian Candler
:10255 is on a test server running microk8s snap: root@nuc1:~# netstat -natp | grep kubelet tcp0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 4129/kubelet tcp0 0 127.0.0.1:37888 127.0.0.1:16443 ESTABLISHED 4129/kubelet tcp6 0

Re: [prometheus-users] Re: Efficient way to query non-active time series's last value

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 16:37:49 UTC+1, Peter S wrote: > > TSDB is fine. Sorry I wasn't being clear. Network traffic has become the > bottleneck. Even though the exporter and prometheus are collocated on the > same machine, scrapes have begun timing out more and more often. Next we > think

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Brian Candler
It depends what these numbers mean individually and what is a meaningful way to combine them. You have avg(), max() etc. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it,

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Gergely Brautigam
Uh. I see what you mean. Also, yes, after some fiddling, this isn't actually a good query. Also, also, I have some further problems with instances and pods. The same number is coming from different pods in the cluster. Prometheus nicely shows that separately but I can't "without" it. Somehow

[prometheus-users] Metrics reporting scraping timeout and failures

2020-09-01 Thread Peter S
Hi, I wonder if there is a metrics to check that. We love that `scrape_samples_scraped` report number of metrics scraped for each target, but haven't found a way to monitor and report on scraping timeout and failures. Thanks, Peter -- You received this message because you are subscribed to

[prometheus-users] Re: Monitoring Palo Alto

2020-09-01 Thread sadhan...@gmail.com
Hi Brian, The exact command line is curl "http://10.100.49.10:9116/snmp?target=10.100.48.70=paloalto_fw; And post executing this i am getting error as below:- curl "http://10.100.49.10:9116/snmp?target=10.100.48.70=paloalto_fw; An error has occurred while serving metrics: 14 error(s)

Re: [prometheus-users] Re: Efficient way to query non-active time series's last value

2020-09-01 Thread Peter S
We measured by `curl | wc` Also `scrape_samples_scraped` reports that 400k metrics are exported and scraped. TSDB is fine. Sorry I wasn't being clear. Network traffic has become the bottleneck. Even though the exporter and prometheus are collocated on the same machine, scrapes have begun

[prometheus-users] Re: Monitoring Palo Alto

2020-09-01 Thread Brian Candler
The exact curl command line? Does it include query parameter "module=paloalto_fw" ? -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [prometheus-users] Re: A query on increase function

2020-09-01 Thread Brian Candler
Counters should be monotonically increasing. A "reset" just means that the exporter caused the counter to decrease for some reason (typically a process restart). If a counter goes 3 ... 7 ... 10 ... 2 ... 5 ... then you have no idea what happened between the "10" and the "2". Therefore,

Re: [prometheus-users] Re: A query on increase function

2020-09-01 Thread Manish G
Got it. Thanks a lot. BTW, can a reset happen on a metrics even if it's alive? If that's not the case, then we may not have much issue as we handle all individual metrics independently for increase calculation. With regards On Tue, Sep 1, 2020 at 7:42 PM Brian Candler wrote: > On Tuesday, 1

Re: [prometheus-users] Re: A query on increase function

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 14:55:25 UTC+1, Manish G wrote: > > Going by (m1.2-m1.0)/(t2-t0) * 1h, even though m1 played out only for > period t2-t0, we still multiply by 1h, and same for m3. > So while increase() is expected to give absolute delta(m1.2-m1.0), it > seems that's not the case. >

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Brian Candler
If it works for you, then you get to own the query and the results :-) But "garbage in, garbage out" applies here. What you've written doesn't "count increases". It calculates average rates of increase and scales them across the whole period, whilst skipping steps where the counter appears

Re: [prometheus-users] Re: A query on increase function

2020-09-01 Thread Manish G
Thanks for the detailed response. Going by (m1.2-m1.0)/(t2-t0) * 1h, even though m1 played out only for period t2-t0, we still multiply by 1h, and same for m3. So while increase() is expected to give absolute delta(m1.2-m1.0), it seems that's not the case. Another concern: when we apply increase

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Gergely Brautigam
I agree that is should be a counter. And to be fair, I did create a ticket to add / make it a counter. :) So I agree with you on that. For now, however, I need to work with this. And you see it right, yes... It will report the same count and that's why it's probably so high. The thing about

[prometheus-users] Re: A query on increase function

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 14:03:34 UTC+1, Manish G wrote: > > I have a query regarding working of increase function. > > Suppose for a query I get multiple metrics in return(a use-case can be > like an application running in multiple pods in kubernetes, so multiple > sources for same

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 13:24:33 UTC+1, Gergely Brautigam wrote: > > The metric IS a gauge. It goes up on server restarts and then goes down > again as the reconciliation does not find any more servers which have > restarted. It's part of a statistic for all servers. So, for now, I have to

[prometheus-users] Monitoring Palo Alto

2020-09-01 Thread Sadhana Kumari
HI Team, We are trying to configure Palo Alto FW through Prometheus. We need to monitor below metrics panSysHAState panSysHAPeerState panSysHAMode We are able to see this is snmp.yml file but while executing the curl command for the IP we are not seeing these metrics only Interface metrics

[prometheus-users] Re: Query regarding remote write using different tokens

2020-09-01 Thread Brian Candler
The remote_write section is a list of multiple remote_write endpoints , each of which contains a url and bearer token. Is there a problem? -- You received this message because you are subscribed to the

[prometheus-users] A query on increase function

2020-09-01 Thread Manish G
Hi All, I have a query regarding working of increase function. Suppose for a query I get multiple metrics in return(a use-case can be like an application running in multiple pods in kubernetes, so multiple sources for same metrics). I apply a 1h time window. *mymetrics[1h])* This results in

[prometheus-users] compression in prometheus

2020-09-01 Thread Rodolphe Ghio
Hi, I am curently doing an intership and my tutor asked my to do an algorithm to compress prometheus data, what do you think about that is it possible ? Cheers, Ghio Rodolphe. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe

[prometheus-users] Annotations to scrape multiple containers in same pod?

2020-09-01 Thread klavs....@gmail.com
Hi, I have a pod which already has: prometheus.io/target: 127.0.0.1:1161 But I also want the kubernetes_sd (kubernetes-jobs) scraper to scrape port 3903 on /metrics (a diff. container in the pod exposes that port). I can't find any documentation on kubernetes_sd on how to do that - and my

Re: [prometheus-users] Sum over time without instance

2020-09-01 Thread Gergely Brautigam
Hi Brian! Thanks! :) The metric IS a gauge. It goes up on server restarts and then goes down again as the reconciliation does not find any more servers which have restarted. It's part of a statistic for all servers. So, for now, I have to deal with it being a gauge. resets MIGHT be actually

[prometheus-users] Query regarding remote write using different tokens

2020-09-01 Thread Sushil pratap singh
Hi , I want to use remote_write to cortex. But our customer wants that we should have different tokens for different vendors. for eg-: there are two vendors for which we want to send data to cortex using different tokens. We have the same URL of cortex to be send in remote write can anybody

[prometheus-users] Re: Sum over time without instance

2020-09-01 Thread Brian Candler
First understand what your metric does, and what change in that metric you're looking for. Decide whether it is a counter (it increments for each server restart, resetting to zero only when some collector restarts) or is a gauge (e.g. the number of restarts in a 5 minute period). If it's a

Re: [prometheus-users] Pod level cpu usage metrics

2020-09-01 Thread Matthias Rampke
CPU usage counters represent the CPU time spent in seconds. You can get the CPU usage (time spent in seconds, per second) by using the rate function: rate(container_cpu_usage_seconds_total[1m]) /MR On Tue, Sep 1, 2020 at 11:32 AM Manish G wrote: > Hi All, > > I want to display pod cpu usage

[prometheus-users] Pod level cpu usage metrics

2020-09-01 Thread Manish G
Hi All, I want to display pod cpu usage graph for my application running in kubernetes. I want to show the graph for CPU usage at a given time, like what we have in visual vm. Metrics I see available is container_cpu_usage_seconds_total. But this is a counter. It doesn't suit my usecase as cpu

[prometheus-users] Volume Query Specification

2020-09-01 Thread 'azha...@googlemail.com' via Prometheus Users
Hi All I was wondering if you can advise? I have the following two Prometheus queries relating to windows disk space usage. The first of which specifically focusing on the drive C: and the second all drives. 100-(100 - 100 * (windows_logical_disk_free_bytes{volume="C:"} /

[prometheus-users] Re: Consul Recording Rules

2020-09-01 Thread Brian Candler
1. I don't know what you mean by "Consul doesn't pick it up". Recording rules are nothing to do with service discovery. Why are you expecting Consul to pick up this file? 2. Where exactly are you putting this JSON, and what is reading the JSON? For example, are you writing it to

Re: [prometheus-users] Network traffic used per day

2020-09-01 Thread Mario Pranjic
Hi, Thanks for the tip. I am playing with sum and increase trying to get a gauge with value for the current day. So, basically, I test: sum(increase(node_network_transmit_bytes_total{instance="proxy.yggdrasil.local:9100", device="ens3"}[*5m*])) I set "Today" on time range which looks ok:

[prometheus-users] Re: Consul Recording Rules

2020-09-01 Thread Jo
Thanks Brian. Taking the above example. If I supply below json: { "groups": [ { "name": "example", "rules": [ { "record": "job:http_inprogress_requests:sum", "expr": "sum by (job) (http_inprogress_requests)" }

[prometheus-users] Re: Prometheus not getting metrics from cadvisor

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 06:51:29 UTC+1, Max Furman wrote: > > It's odd to me that the path I got from the examples is /metrics/advisor, > but the path I queried using `kubectl get` was /proxy/metrics/advisor. > > This is not well documented, but I think it's the difference between scraping

[prometheus-users] Re: Question on Promethues alert rules

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 05:08:27 UTC+1, kiran wrote: > > In Promethues alert rules file, I am trying to understand what variables > are available to use: > In the screenshot below, I see *$value *, *$labels*, *$labels.instance* > How do I know what variables are available to access within

[prometheus-users] Re: Consul Recording Rules

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 03:20:18 UTC+1, Jo wrote: > > If I supply an array of rules it works but If I supply an array of groups > it doesn't work. > Please show the two configurations so we can compare them. Also explain exactly what you mean by "it doesn't work". Note that in the

Re: [prometheus-users] Re: Efficient way to query non-active time series's last value

2020-09-01 Thread Brian Candler
On Tuesday, 1 September 2020 01:55:15 UTC+1, Peter S wrote: > > Thanks. Unfortunately, exporting and scraping the same values have become > costly for us. We have metrics endpoints of 50MB+, and scraping have begun > to time out more and more often. > > Sorry, can you explain what you mean by

[prometheus-users] Re: Include hostname in the alert summary message

2020-09-01 Thread Brian Candler
You can join your query with node_uname_info, using group_left, so that the query result gains additional labels from node_uname_info. The approach is described here: https://www.robustperception.io/how-to-have-labels-for-machine-roles