On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala sudeeptok...@gmail.comwrote:
Hi all,
How to avoid serialization and deserialization overhead in hive join query
? will this optimize my query performance.
Regards
sudeep
You may want to be clearer. Is your question : how can I change the
serialization strategy of Hive? (If so I let other users answer and I am
also interested in the answer.)
Else the answer is simple. If you want to join data which can not be stored
into memory, you need to serialize them. The
: sudeep tokala sudeeptok...@gmail.com
To: user@hive.apache.org
Sent: Tuesday, August 14, 2012 11:00 PM
Subject: Re: OPTIMIZING A HIVE QUERY
hi Bertrand,
Thanks for the reply.
My question was every join in a hive query would constitute to a Mapreduce job.
Mapreduce job goes through serialization
Thanks for the reply Bertrand.
On Tue, Aug 14, 2012 at 2:12 PM, Bertrand Dechoux decho...@gmail.comwrote:
My question was every join in a hive query would constitute to a
Mapreduce job.
In the general case, yes. BUT if one side of your join is small enough (ie
you can keep all in memory),