influxdb metrics reporter - 4k series per job restart

Filip Karnicki Thu, 30 Jun 2022 05:46:08 -0700

Hi All

We're using the influx reporter (flink 1.14.3), which seems to create a
series per:
-[task|job]manager
- host
- job_id
- job_name
- subtask_index
- task_attempt_id
- task_attempt_num
- task_id
- tm_id


which amounts to about 4k of series each time our job restarts itself

We are currently experiencing problems with checkpoint duration timeouts (>
60s) (unrelated) and every 60 secs our job restarts and creates further 4k
series in influxdb.

Needless to say, the team managing influxdb is not too happy with the
amount of series we create.

Is there anything I can do to either reduce the number of series, or reduce
the number of types of metrics in order to produce fewer series? (we don't
view all the available metrics in grafana, so we don't necessarily have to
send all of them)

The db caps at 1M series, and with our current problems with checkpointing
we go through that many in a matter of hours

Many thanks
Fil

influxdb metrics reporter - 4k series per job restart

Reply via email to