Hello All,

I would like some suggestions on where I can start in the Avro project.


I want to be able to read from an Avro formatted log file (specifically the
History Log file created at the end of a Hadoop job) and create a Comma
Separated file of certain log entries. I need a csv file because this is
the format that is accepted by post processing software I am working with
(eg: Matlab).


Initially I was using a BASH script to grep and awk from this file and
create my CSV file because I needed a very few values from it, and a quick
script just worked. I didn't try to get to know what format the log file
was in and utilize that. (my bad!)  Now that I need to be scaling up and
want to have a reliable way to parse, I would like to try and do it the
right way.


My question is this: For the above goal, could you please guide me with
steps I can follow - such as reading material and libraries I could try to
use. As I go through the Quick Start Guide and FAQ, I see that a lot of the
information here is geared to someone who wants to use the data
serialization and RPC functionality provided by Avro. Given that I only
want to be able to "read", where may I start?


I can comfortably script with BASH and Perl. Given that I only see support
for Java, Python and Ruby, I think I can take this as as opportunity to
learn Python and get up to speed.


Thanks a lot.


-Selvi

Reply via email to