Hi,

recently we started seeing the following faulty behaviour in the Flink Stateful Functions HTTP communication towards external Python workers. This is only occuring when the system is under heavy load.

The Java Application will send HTTP Messages to an external Python Function but the external Function fails to parse the message with a "Truncated Message Error". Printouts show that the truncated message looks as follows:

------------------------------

<Start of Message>

my.protobuf.MyClass: <Protobuf Content>

my.protobuf.MyClass: <Protobuf Content>

my.protobuf.MyClass: <Protobuf Content>

my.protobuf.MyClass: <Protob

------------------------------

Which leads to the following Error in the Python worker:

------------------------------

Error Parsing Message: Truncated Message

------------------------------

Either the sender or the receiver (or something in between) seems to be truncacting some (not all) messages at some random point in the payload. The source code in both Flink SDKs looks to be correct. We temporarily solved this by setting the "maxNumBatchRequests" parameter in the external function definition really low. But this is not an ideal solution as we believe this adds considerable communication overhead between the Java and the Python Functions.

The Stateful Function version is 2.2.2, java8. The Java App as well as the external Python workers are deployed in the same kubernetes cluster.


Has anyone ever seen this problem before?

Best regards

Jan

--
neuland  – Büro für Informatik GmbH
Konsul-Smidt-Str. 8g, 28217 Bremen

Telefon (0421) 380107 57
Fax (0421) 380107 99
https://www.neuland-bfi.de

https://twitter.com/neuland
https://facebook.com/neulandbfi
https://xing.com/company/neulandbfi


Geschäftsführer: Thomas Gebauer, Jan Zander
Registergericht: Amtsgericht Bremen, HRB 23395 HB
USt-ID. DE 246585501

Reply via email to