Not sure if I am approaching this problem correctly, But here is the basic
outline:

I would like to send say 10000, or even more small Avro messages in a
single Flume Event For storage on HDFS.

When I do this, it corrupts the "Avro" file created on HDFS because (I
assume based in a bit of reading) that it messes with the "Framing" that
Avro provides.

So the long and the short of it is that if I send, say 2, Flume events each
containing 10000 Avro Messages for storage on HDFS and stores the 2
"Packets of" of avro messages in a single file on HDFS (using the HDFS
sink), the first 10000 messages are readable, but the 10001 message is
corrupt.


I am doing this for performance purposes, I need to be sending about
1500*3600 = 5,400,000  (yes 5.4 million) small messages every ~4 seconds.

I know this is alot of messages....

I can produce the message at the correct rate, but I cannot flume them in
very fast because I have to create an "Flume Event" with a Avro Schema
attached to each message, so I thought if I could batch up a bunch of them
at once, It would be more efficient.

Thanks In Advacnce!

Q. Boiler

Reply via email to