Looks good!
On Sat, May 1, 2021, 23:43 Ashraf Guitouni wrote:
> Understood. Thank you for the explanation!
>
> I tried the expression and I got the following error:
> Error executing query: multiple matches for labels: grouping labels must
> ensure unique matches
>
> The output of the first
I realized that using the request metrics may not work because they can
only be updated once a request is complete. Ideally you'd have a direct "is
this pod occupied" 1/0 metric from each model pod, but I don't know if
that's possible with the framework.
For the GPU metrics, we need to match the
Thank you for your reply.
There's no GPU sharing for pods at the moment (this is how it is in general
for k8s, except for Nvidia MIGs). The goal is to have HPA
increasing/decreasing the replicas for a deployment, which will call on the
cluster autoscaler to provision a new node if needed.
Hi,
It depends on how the pods from the same node are sharing the GPU, but I
think it is doable if you configure the hpa to spawn new pods and the pods
to `request` GPU resources, this will force the GKE cluster autoscaler into
creating new nodes to locate the new pods.
Are you using KubeFlow
That looks good, I think the issue is which target(s) you discover for
these jobs.
If you scrape Prometheus directly you may have to change the TLS settings
depending on your configuration.
/MR
On Sat, Apr 24, 2021, 08:58 'Evelyn Pereira Souza' via Prometheus Users <
Hi all.
I'm trying to implement HPA based on GPU utilization metrics.
My initial approach is to use DCGM Exporter which is a daemonset that runs
a pod on every GPU node and exports GPU metrics.
By setting an additional scrape config when installing
kube-prometheus-community and a custom
Please see the documented list of available service discovery methods:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
On Sat, May 1, 2021 at 9:57 AM nbada...@gmail.com
wrote:
> Hi Guys,
>
> I have process exporter installed on some of the nodes, and i
Hi. I am using Blackbox Exporter version 0.18.0. I am want to know which
field will be considered in case fail_if_body_not_matches_regexp and
fail_if_body_matches_regexp contradict each other? For example:
fail_if_body_not_matches_regexp: ['OK']
fail_if_body_matches_regexp: ['NOK']
Hi Guys,
I have process exporter installed on some of the nodes, and i have below
snippet setup in prometheus.yml:
- job_name: 'process'
static_configs:
- targets: [host1:9256, host2:9256, host3:9256.host10:9256]
Problem with above setup is that every time i onboard process
9 matches
Mail list logo