Hi Chesnay, Many thanks for your reply. At the end, we have decided to change the infrastructure a bit and use StatD instead. This way, we don't need a custom reporter and it works fine.
Thanks! Bruno On Fri, 5 May 2017 at 13:20 Chesnay Schepler <ches...@apache.org> wrote: > Hello, > > for Graphite, Flink uses the DropWizard metrics reporter. I don't know > at the moment whether it supports any kind of reconnecting functionality. > > I'm not sure whether i understood you correctly; did you try upgrading > the DropWizard metrics-core/metrics-graphite dependencies? > > If that didn't do the trick we could in fact implement this in Flink, it > would be hack though. When an error occurs we can simply re-instantiate > the reporter, but we would have to know how the reporter communicates > the connection drop; i.e. whether it throws some exception or not. > > Could you check the log for a warning statements from the MetricRegistry? > > Regards, > Chesnay > > On 05.05.2017 13:26, Bruno Aranda wrote: > > Hi, > > > > We are using the Graphite reporter from Flink 1.2.0 to send the > > metrics via TCP. Due to our network configuration we cannot use UDP at > > the moment. > > > > We have observed that if there is any problem with graphite our the > > network, basically, the TCP connection times out or something, the > > metrics reporter does not recover. This is easy to reproduce by > > blocking the port we are sending the metrics using iptables. If we > > block the port for more than a minute or so, the problem will happen. > > After the port is re-open, Flink does not continue like before. > > > > Is this a known issue? Googling shows some problems with the > > metrics-graphite package that should have been solved already. We have > > trying updated metrics-core/graphite to the latest with no success. > > > > Any ideas? > > > > Thanks! > > > > Bruno > > >