Hello, I just pull the thread up, if someone knows how to make the avro messages consumption faster, I would be grateful. Some more info: When we switched from ConsumeKafka with jsons to ConsumeKafkaRecord with avro messages, we experienced a serious slowdown (mutliple X) . I can get more data what slowdown precisely, but my question about ConsumeKafka/MergeContent based flow becomes even more relevant to me. Or maybe I'm doing something wrong, that ConsumeKafkaRecord is so slower?
BTW, I'm on Nifi 1.7.1. Thank you, Krzysztof Zarzycki pt., 7 gru 2018 o 22:24 Krzysztof Zarzycki <[email protected]> napisaĆ(a): > Hi everyone, > I think I have quite a standard problem and maybe the answer would be > quick, but I can't find it on the internet. > We have avro messages in Kafka topic, written with HWX schema reference. > We're able to read them in with e.g. ConsumeKafkaRecord with Avro reader. > > Now we would like to merge smaller flowfiles to larger files, because we > load these files to HDFS. What combination of processors should we use to > get this with the highest performance? > Option 1: ConsumeKafkaRecord with AvroReader and AvroRecordSetWriter, then > MergeRecord with AvroReader/AvroRecordSetWriter. It works, it seems > straight forward, but for me it looks like there is too many > interpretations and rewrites of records. Each records interpretation is an > unnecessary cost of deserialization and then serialization through java > heap. > > Option 2: somehow configure ConsumeKafka and MergeContent to do this? We > used this combination for simple jsons (with binary concatenation), but we > can't get it right with avro messages with schema reference (PutParquet > processor can't read merged files with AvroReader). On the other side, this > should be the fastest as there is no data interpretation, just byte to byte > rewrite. Maybe we just haven't tried some of the configuration combination? > > Maybe Other options? > > Thank you for an advice. > Krzysztof >
