Hello,

Is anyone familiar with the "blob server connection"? We have constantly
been seeing the "Error while executing Blob connection" error, which
sometimes causes a job stuck in the middle of a run if there are too many
connection errors and eventually causes a failure, though most of the time
the streaming run mode can recover from that failure in the subsequent
iterations of runs, but that slows down the entire process. We tried
adjusting the blob.fetch.num-concurrent and some other blob parameters, but
it was not very helpful, so we want to know what might be the root cause of
the issue. Are there any Flink metrics or tools to help us monitor the blob
server connections?

We use:

   - Flink Kubernetes Operator
   - Flink 1.15.3 and 1.16.0
   - Kafka, filesystem(S3)
   - Hudi 0.11.1

Full error message:

java.io.IOException: Unknown operation 71
        at 
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:116)
[flink-dist-1.15.3.jar:1.15.3]
2023-01-19 16:44:37,448 ERROR
org.apache.flink.runtime.blob.BlobServerConnection           [] -
Error while executing BLOB connection.


Best regards,
Yang

Reply via email to