Hi all I'm currently evaluationg Apache Beam to transfer messages from RabbitMq to kafka with some transform in between.
Doing, so, i've discovered some differences between direct runner behaviour and Google Dataflow runner. But first, a small introduction to what I know. From what I understand, elements transmitted between two different transforms are serialized/deserialized. This (de)serialization process is normally managed by Coder, in which the most used is obviously the Serializablecoder, which takes a serializable object and (de)serialize it using classical java mechanisms. On direct runner, i had issues with rabbitMq messages, as they contain in their headers objects that are LongString, an interface implemented solely in a private static class of RabbitMq, and used for large text messages. So I wrote a RabbitMqMessageCoder, and installed it in my pipeline (using pipeline.getCoderregistry().registerCoderForClass(RabbitMqMessage.class, new MyCoder()) And it worked ! well, not in dataflow runner. indeed, it seems like dataflow runner don't use this coder registry mechanism (for reasons I absolutely don't understand). So my fix didn't work. After various tries, I finally gave up and directly modified the RabbitMqIO class (from Apache Beam) to handle my case. This fix is available on my Beam fork on GitHub, and i would like to have it integrated. What is the procedure to do so ? Thanks !