Hello,

for Graphite, Flink uses the DropWizard metrics reporter. I don't know at the moment whether it supports any kind of reconnecting functionality.

I'm not sure whether i understood you correctly; did you try upgrading the DropWizard metrics-core/metrics-graphite dependencies?

If that didn't do the trick we could in fact implement this in Flink, it would be hack though. When an error occurs we can simply re-instantiate the reporter, but we would have to know how the reporter communicates the connection drop; i.e. whether it throws some exception or not.

Could you check the log for a warning statements from the MetricRegistry?

Regards,
Chesnay

On 05.05.2017 13:26, Bruno Aranda wrote:
Hi,

We are using the Graphite reporter from Flink 1.2.0 to send the metrics via TCP. Due to our network configuration we cannot use UDP at the moment.

We have observed that if there is any problem with graphite our the network, basically, the TCP connection times out or something, the metrics reporter does not recover. This is easy to reproduce by blocking the port we are sending the metrics using iptables. If we block the port for more than a minute or so, the problem will happen. After the port is re-open, Flink does not continue like before.

Is this a known issue? Googling shows some problems with the metrics-graphite package that should have been solved already. We have trying updated metrics-core/graphite to the latest with no success.

Any ideas?

Thanks!

Bruno


Reply via email to