> I think this is a general issue with the Flink metrics.

Not quite. There are a few instance in Flink were code wasn't updated to encode metadata as additional labels, and the RocksDB metrics may be one of them. Also for RocksDB, you could try setting "state.backend.rocksdb.metrics.column-family-as-variable: true" to resolve this particular problem.

> If I define a custom metric, it is not supported to use labels

You can do so via MetricGroup#addGroup(String key, String value).
See https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/metrics/#user-variables

On 17/10/2023 14:31, Lars Skjærven wrote:
Hello,

We're experiencing difficulties in using Flink metrics in a generic way since various properties are included in the name of the metric itself. This makes it difficult to generate sensible (and general) dashboards (with aggregations).

One example is the metric for rocksdb estimated live data size (state.backend.rocksdb.metrics.estimate-live-data-size). the name appears as : flink_taskmanager_job_task_operator_<my_state_descriptor_name>_state_rocksdb_estimate_live_data_size .

If, on the other hand, the state name was included as label, this would facilitate aggregation across states, i.e.:
flink_taskmanager_job_task_operator_state_rocksdb_estimate_live_data_size{state_descriptor="my_state_descriptor"}

I think this is a general issue with the Flink metrics. If I define a custom metric, it is not supported to use labels (https://prometheus.io/docs/practices/naming/#labels) in a dynamic way.

Thanks !

Lars

Reply via email to