Re: OPTIMIZING A HIVE QUERY

sudeep tokala Tue, 14 Aug 2012 14:01:38 -0700

Thanks for the reply Bertrand.

On Tue, Aug 14, 2012 at 2:12 PM, Bertrand Dechoux <decho...@gmail.com>wrote:


>
> > My question was every join in a hive query would constitute to a
> Mapreduce job.
> In the general case, yes. BUT if one side of your join is small enough (ie
> you can keep all in memory), a hash join/map join can be performed which is
> much more performant (no reduce is required).
>
> Bejoy KS has just provided the right link.
>
> > Store data in the smarter way? can you please elaborate on this.
> That's not Hive related. The same logic applies to RDMS. You want to keep
> a normalized source of data but sometimes 'unnomarlizing' it can greatly
> improves your performance. That's one of the advantage of document store.
> It is very dependent on your use cases.
>
> Bertrand
>
> On Tue, Aug 14, 2012 at 7:30 PM, sudeep tokala <sudeeptok...@gmail.com>wrote:
>
>> hi Bertrand,
>>
>> Thanks for the reply.
>>
>> My question was every join in a hive query would constitute to a
>> Mapreduce job.
>> Mapreduce job goes through serialization and deserilaization of objects
>> Isnt it a overhead.
>>
>> Store data in the smarter way? can you please elaborate on this.
>>
>> Regards
>> Sudeep
>>
>>  On Tue, Aug 14, 2012 at 11:39 AM, Bertrand Dechoux 
>> <decho...@gmail.com>wrote:
>>
>>> You may want to be clearer. Is your question : how can I change the
>>> serialization strategy of Hive? (If so I let other users answer and I am
>>> also interested in the answer.)
>>>
>>> Else the answer is simple. If you want to join data which can not be
>>> stored into memory, you need to serialize them. The only solution is to
>>> store the data in a smarter way which would not require you to do the join.
>>> By the way, how do you know the serialisation is the bottleneck?
>>>
>>> Bertrand
>>>
>>>
>>> On Tue, Aug 14, 2012 at 5:11 PM, sudeep tokala 
>>> <sudeeptok...@gmail.com>wrote:
>>>
>>>>
>>>>
>>>> On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala <sudeeptok...@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> How to avoid serialization and deserialization overhead in hive join
>>>>> query ? will this optimize my query performance.
>>>>>
>>>>> Regards
>>>>> sudeep
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: OPTIMIZING A HIVE QUERY

Reply via email to