Not sure if I am approaching this problem correctly, But here is the basic outline:
I would like to send say 10000, or even more small Avro messages in a single Flume Event For storage on HDFS. When I do this, it corrupts the "Avro" file created on HDFS because (I assume based in a bit of reading) that it messes with the "Framing" that Avro provides. So the long and the short of it is that if I send, say 2, Flume events each containing 10000 Avro Messages for storage on HDFS and stores the 2 "Packets of" of avro messages in a single file on HDFS (using the HDFS sink), the first 10000 messages are readable, but the 10001 message is corrupt. I am doing this for performance purposes, I need to be sending about 1500*3600 = 5,400,000 (yes 5.4 million) small messages every ~4 seconds. I know this is alot of messages.... I can produce the message at the correct rate, but I cannot flume them in very fast because I have to create an "Flume Event" with a Avro Schema attached to each message, so I thought if I could batch up a bunch of them at once, It would be more efficient. Thanks In Advacnce! Q. Boiler
