Hi everyone, I think I have quite a standard problem and maybe the answer would be quick, but I can't find it on the internet. We have avro messages in Kafka topic, written with HWX schema reference. We're able to read them in with e.g. ConsumeKafkaRecord with Avro reader.
Now we would like to merge smaller flowfiles to larger files, because we load these files to HDFS. What combination of processors should we use to get this with the highest performance? Option 1: ConsumeKafkaRecord with AvroReader and AvroRecordSetWriter, then MergeRecord with AvroReader/AvroRecordSetWriter. It works, it seems straight forward, but for me it looks like there is too many interpretations and rewrites of records. Each records interpretation is an unnecessary cost of deserialization and then serialization through java heap. Option 2: somehow configure ConsumeKafka and MergeContent to do this? We used this combination for simple jsons (with binary concatenation), but we can't get it right with avro messages with schema reference (PutParquet processor can't read merged files with AvroReader). On the other side, this should be the fastest as there is no data interpretation, just byte to byte rewrite. Maybe we just haven't tried some of the configuration combination? Maybe Other options? Thank you for an advice. Krzysztof
