[slurm-users] Efficiency of the profile influxdb plugin for graphing live job stats

Lech Nieroda Fri, 13 Dec 2019 09:24:40 -0800

Hi,


I’ve been tinkering with the acct_gather_profile/influxdb plugin a bit  in 
order to visualize the cpu and memory usage of live jobs.
Both the influxdb backend and Grafana dashboards seem like a perfect fit for 
our needs.

I’ve run into an issue though and made a crude workaround for it, maybe someone 
knows a better way?

A few words about influxdb and the influxdb plugin:
InfluxDB is a NoSQL database that organizes its data in „series“, which are 
unique sets of „measurements“ and „tags“, which correspond roughly to tables 
and their indexed fields (if you prefer relational DB).
A single „series“ can reference a multitude of timestamped records described 
further by non indexed „fields".
The acct_gather_profile/influxdb plugin defines its data points for each job 
task/step as follows:

Measurement: CPUTime   Tags: job, host, step, task   Fields: value   Timestamp
Measurement: CPUUtilization   Tags: job, host, step, task   Fields: value   
Timestamp
…

e.g. a single record would look like 
CPUTime,job=12465711,step=0,task=3,host=node20307 value=20.80 1576054517

The default „Task“ Profile contains 8 such characteristics:  CPUTime, 
CPUUtilization, CPUFrequency, RSS, VMSize, Pages, ReadMB, WriteMB

This data structure means that for each job step, task or host, 8 unique 
„series“ are created, e.g. „CPUTime, job, step, task, host“, „CPUUtilization, 
job, step, task, host“, ...
Those „series“ then reference the timestamped values of the respective 
measurements. The „tags“ can be used to „group by“ in queries, e.g. the 
performance of a single job on a specified host.


OK, so what’s the problem?
There are two: the number of created „series“ and data redundancy.
InfluxDB limits the number of „series“ per default to 1 million, and for good 
reason: each „series“ increases RAM usage since it’s used as an index. 
The number of „series" or "series cardinality" is one of the most important 
factors determining memory usage;  the influxdb manual considers a cardinality 
above 10 million as „probably infeasible“.
When you consider that each combination of a new job/host/step/task creates 8 
„series“, the default limit can be reached relatively quickly. Performance 
problems follow.
As to data redundancy: for each timestamp a large part of the same data is 
stored multiple times under different „measurements“.

The current workaround: store the 8 characteristics as „fields“ rather than 
„measurements“, thus creating 1 series per job/step/task/host rather than 8. It 
also reduces data redundancy, saving roughly 70%.

So a single „series" would be:
Measurement: acct_gather_profile_task   Tags: job, step, task, host   Fields: 
CPUTime, CPUUtilization, CPUFrequency, RSS, VMSize, Pages, ReadMB, WriteMB    
Timestamp

Another benefit is that identical „measurement“ names like e.g. „WriteMB" which 
are used both by the task and the lustre/fs profile plugins can be 
differentiated.

Further Ideas?

Kind regards,
Lech

[slurm-users] Efficiency of the profile influxdb plugin for graphing live job stats

Reply via email to