Hello, I’m using Flink 1.4.0 with FlinkKafkaConsumer010 and have been for almost a year. Recently, I started getting messages of the wrong length in Flink causing my deserializer to fail. Let me share what I’ve learned:
1. All of my messages are 520 bytes exactly when my producer places them in kafka 2. About 1% of these messages have this deserialization issue in flink 3. When it happens, I read 10104 bytes in flink 4. When I write the bytes my producer creates to a file on disk (rather than kafka) my code reads 520 bytes and consumes them without issue off of disk 5. When I use kafka tool (http://www.kafkatool.com/index.html) to dump the contents of my topic to disk, and read each message one at a time off of disk, my code reads 520 bytes per message and consumes them without issue 6. When I write a simple Kafka consumer (not using flink) I read one message at a time it’s 520 bytes and my code runs without issue #5 and #6 are what lead me to believe that this issue is squarely a problem with Flink. However, it gets more complicated, I took the messages I wrote out with both my simple consumer and the kafka tool, and I load them into a local kafka server, then attach a local flink cluster and I cannot reproduce the error, yet I can reproduce it 100% of the time in something closer to my production environment. I realize this latter sounds suspicious, but I have not found anything in the Kafka docs indicating that I might have a configuration issue here, yet my simple local setup that would allow me to iterate on this and debug has failed me. I’m really quite at a loss here, I believe there’s a Flink Kafka consumer bug, it happens exceedingly rarely as I went a year without seeing it. I can reproduce it in an expensive environment but not in a “cheap” environment. Thank you for your time, I can provide my sample data set in case that helps. I dumped it on my google drive https://drive.google.com/file/d/1h8jpAFdkSolMrT8n47JJdS6x21nd_b7n/view?usp=sharing that’s the full data set, about 1% of it ends up failing, it’s really hard to figure out which message since I can’t read any of the message that I receive and I get data out of order.