Hello All,
I would like some suggestions on where I can start in the Avro project. I want to be able to read from an Avro formatted log file (specifically the History Log file created at the end of a Hadoop job) and create a Comma Separated file of certain log entries. I need a csv file because this is the format that is accepted by post processing software I am working with (eg: Matlab). Initially I was using a BASH script to grep and awk from this file and create my CSV file because I needed a very few values from it, and a quick script just worked. I didn't try to get to know what format the log file was in and utilize that. (my bad!) Now that I need to be scaling up and want to have a reliable way to parse, I would like to try and do it the right way. My question is this: For the above goal, could you please guide me with steps I can follow - such as reading material and libraries I could try to use. As I go through the Quick Start Guide and FAQ, I see that a lot of the information here is geared to someone who wants to use the data serialization and RPC functionality provided by Avro. Given that I only want to be able to "read", where may I start? I can comfortably script with BASH and Perl. Given that I only see support for Java, Python and Ruby, I think I can take this as as opportunity to learn Python and get up to speed. Thanks a lot. -Selvi