hello,
we build flink report metrics to prometheus pushgateway, the program has been
running for a period of time, with a amount of data reported to pushgateway,
pushgateway response socket timeout exception, and much of metrics data
reported failed. following is the exception:
2023-12-12 04:13:07,812 WARN
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter [] - Failed
to push metrics to PushGateway with jobName
00034937_20231211200917_54ede15602bb8704c3a98ec481bea96, groupingKey{}.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream. socketRead(Native Method) ~[?:1.8.0_281]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0
281]
at java.net.SocketInputStream.read(SocketInputStream. java:171) ~[?:1.8.0 281]
at java.net.SocketInputStream.read(SocketInputStream. java:141) ~[?:1.8.0 2811
at java.io.BufferedInputStream.fill (BufferedInputStream. java:246) ~[?:1.8.0
2811 at java.io. BufferedInputStream.read1(BufferedInputStream.java:286)
~[?:1.8.0_281] at
java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0 281]
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
~[?:1.8.0_281] at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
~[?:1.8.0_281] at
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
~[?:1.8.0_281] at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
~[?:1.8.0 2811 at
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)~[?:1.8.0_281]
at
io.prometheus.client.exporter.PushGateway.doRequest(PushGateway.java:315)~[flink-metrics-prometheus-1.13.5.jar:1.13.5]
at io.prometheus. client.exporter .PushGateway .push (PushGatevay . java:138)
~[flink-metrics-prometheus-1.13.5. jar:1.13.51
at
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter.report(PrometheusPushGatewayReporter.java:63)
[flink-metrics-prometheus-1.13.5.jar:1.13.51
at org.apache. flink.runtime.metrics.MetricRegistryImp1$ReporterTask.run
(MetricRegistryImpl. java:494) [flink-dist_2.11-1.13.5.jar:1.13.5]
after test, it was caused with amount of data reported to pushgateway, then we
restart pushgateway server and the exception disappeared, but after sever hours
the exception re-emergenced.
so i want to know how to config flink or pushgateway to avoid the exception?
best regards.
leilinee