Hi All

We're using the influx reporter (flink 1.14.3), which seems to create a
series per:
-[task|job]manager
- host
- job_id
- job_name
- subtask_index
- task_attempt_id
- task_attempt_num
- task_id
- tm_id

which amounts to about 4k of series each time our job restarts itself

We are currently experiencing problems with checkpoint duration timeouts (>
60s) (unrelated) and every 60 secs our job restarts and creates further 4k
series in influxdb.

Needless to say, the team managing influxdb is not too happy with the
amount of series we create.

Is there anything I can do to either reduce the number of series, or reduce
the number of types of metrics in order to produce fewer series? (we don't
view all the available metrics in grafana, so we don't necessarily have to
send all of them)

The db caps at 1M series, and with our current problems with checkpointing
we go through that many in a matter of hours

Many thanks
Fil

Reply via email to