Hello,
1. Because no one found time to fix it. In contrast to the remaining
byte/record metrics, input metrics for sources / output metrics for
sinks have to be implemented for every single implementation with their
respective semantics. In contrast, the output metrics are gathered in
the intersection between operators, independent of the actual operator
implementation. Furthermore, this requires system metrics (i.e. metrics
that Flink itself creates) to be exposed (and be mutable!) to
user-defined functions, which is something i /generally /wanted to
avoid, but it appears to be a big enough pain point to make an exception
here.
2. Due to the above it is currently not possible without modifications
of the code to know how many reads/writes were made.
3. Do you mean aggregated metrics? The web UI allows the aggregation of
record/byte metrics on the task level. Beyond that we defer aggregation
to actual time-series databases that specialize in these things.
On 28.08.2017 19:08, Martin Eden wrote:
Hi all,
Just 3 quick questions both related to Flink metrics, especially
around sinks:
1. In the Flink UI Sources always have 0 input records / bytes and
Sinks always have 0 output records / bytes? Why is it like that?
2. What is the best practice for instrumenting off the shelf Flink sinks?
Currently the only metrics available are num records/bytes in and out
at the operator and task scope. For the task scope there are extra
buffer metrics. However the output metrics are always zero (see
question 1). How can one know the actual number of successful writes
done by an off the shelf Flink sink? Or the latency of the write
operation?
3. Is it possible to configure Flink to get global job metrics for all
subtasks of an operator? Or are there any best practices around that?
Thanks,
M