Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala sudeeptok...@gmail.comwrote: Hi all, How to avoid serialization and deserialization overhead in hive join query ? will this optimize my query performance. Regards sudeep

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bertrand Dechoux
You may want to be clearer. Is your question : how can I change the serialization strategy of Hive? (If so I let other users answer and I am also interested in the answer.) Else the answer is simple. If you want to join data which can not be stored into memory, you need to serialize them. The

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bejoy Ks
: sudeep tokala sudeeptok...@gmail.com To: user@hive.apache.org Sent: Tuesday, August 14, 2012 11:00 PM Subject: Re: OPTIMIZING A HIVE QUERY hi Bertrand,   Thanks for the reply.   My question was every join in a hive query would constitute to a Mapreduce job. Mapreduce job goes through serialization

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
Thanks for the reply Bertrand. On Tue, Aug 14, 2012 at 2:12 PM, Bertrand Dechoux decho...@gmail.comwrote: My question was every join in a hive query would constitute to a Mapreduce job. In the general case, yes. BUT if one side of your join is small enough (ie you can keep all in memory),