Re: [prometheus-users] Horizontal Pod Autoscaling using Nvidia GPU Metrics

2021-05-01 Thread Matthias Rampke
Looks good! On Sat, May 1, 2021, 23:43 Ashraf Guitouni wrote: > Understood. Thank you for the explanation! > > I tried the expression and I got the following error: > Error executing query: multiple matches for labels: grouping labels must > ensure unique matches > > The output of the first

Re: [prometheus-users] Horizontal Pod Autoscaling using Nvidia GPU Metrics

2021-05-01 Thread Matthias Rampke
I realized that using the request metrics may not work because they can only be updated once a request is complete. Ideally you'd have a direct "is this pod occupied" 1/0 metric from each model pod, but I don't know if that's possible with the framework. For the GPU metrics, we need to match the

[prometheus-users] Horizontal Pod Autoscaling using Nvidia GPU Metrics

2021-05-01 Thread Ashraf Guitouni
Hi all. I'm trying to implement HPA based on GPU utilization metrics. My initial approach is to use DCGM Exporter which is a daemonset that runs a pod on every GPU node and exports GPU metrics. By setting an additional scrape config when installing kube-prometheus-community and a custom