Sorry if the subject sounds really stupid ! Basically I am re-architecting our web log record format
Currently we have "Multiple lines = 1 Record " format (I have Hadoop jobs that parse the files and create columnar output for Hive tables) [begin_unique_id] Pipe delimited Blah.................... Pipe delimited Blah.................... Pipe delimited Blah.................... Pipe delimited Blah.................... Pipe delimited Blah.................... [end_unique_id] I have created JSON serializers that will log records in the following way going forward <unique_id> <JSON-string> This is the plan - I will store the records in a two column table in Hive - Write JSON deserializers in hive HDFs that will take these tables and create hive tables pertaining to specific requirements - Modify current aggregation scripts in Hive I was seeing AVRO format but I don't see the value of using AVO when I feel JSON gives me pretty much the same thing ? Please poke holes in my thinking ! Rip me apart ! Thanks Regards sanjay CONFIDENTIALITY NOTICE ====================== This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.