Sorry if the subject sounds really stupid !

Basically I am re-architecting our web log record format

Currently we have "Multiple lines = 1 Record " format (I have Hadoop jobs that 
parse the files and create columnar output for Hive tables)

[begin_unique_id]
Pipe delimited Blah....................
Pipe delimited Blah....................
Pipe delimited Blah....................
Pipe delimited Blah....................
Pipe delimited Blah....................
[end_unique_id]


I have created JSON serializers that will log records in the following way 
going forward
<unique_id>     <JSON-string>

This is the plan
- I will store the records in a two column table in Hive
- Write JSON deserializers in hive HDFs that will take these tables and  create 
hive tables pertaining to specific requirements
- Modify current aggregation scripts in Hive

I was seeing AVRO format but I don't see the value of using AVO when I feel 
JSON gives me pretty much the same thing ?

Please poke holes in my thinking ! Rip me apart !


Thanks
Regards

sanjay



CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Reply via email to