Thanks for the reply Bertrand. On Tue, Aug 14, 2012 at 2:12 PM, Bertrand Dechoux <decho...@gmail.com>wrote:
> > > My question was every join in a hive query would constitute to a > Mapreduce job. > In the general case, yes. BUT if one side of your join is small enough (ie > you can keep all in memory), a hash join/map join can be performed which is > much more performant (no reduce is required). > > Bejoy KS has just provided the right link. > > > Store data in the smarter way? can you please elaborate on this. > That's not Hive related. The same logic applies to RDMS. You want to keep > a normalized source of data but sometimes 'unnomarlizing' it can greatly > improves your performance. That's one of the advantage of document store. > It is very dependent on your use cases. > > Bertrand > > On Tue, Aug 14, 2012 at 7:30 PM, sudeep tokala <sudeeptok...@gmail.com>wrote: > >> hi Bertrand, >> >> Thanks for the reply. >> >> My question was every join in a hive query would constitute to a >> Mapreduce job. >> Mapreduce job goes through serialization and deserilaization of objects >> Isnt it a overhead. >> >> Store data in the smarter way? can you please elaborate on this. >> >> Regards >> Sudeep >> >> On Tue, Aug 14, 2012 at 11:39 AM, Bertrand Dechoux >> <decho...@gmail.com>wrote: >> >>> You may want to be clearer. Is your question : how can I change the >>> serialization strategy of Hive? (If so I let other users answer and I am >>> also interested in the answer.) >>> >>> Else the answer is simple. If you want to join data which can not be >>> stored into memory, you need to serialize them. The only solution is to >>> store the data in a smarter way which would not require you to do the join. >>> By the way, how do you know the serialisation is the bottleneck? >>> >>> Bertrand >>> >>> >>> On Tue, Aug 14, 2012 at 5:11 PM, sudeep tokala >>> <sudeeptok...@gmail.com>wrote: >>> >>>> >>>> >>>> On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala <sudeeptok...@gmail.com >>>> > wrote: >>>> >>>>> Hi all, >>>>> >>>>> How to avoid serialization and deserialization overhead in hive join >>>>> query ? will this optimize my query performance. >>>>> >>>>> Regards >>>>> sudeep >>>>> >>>> >>>> >>> >>> >>> -- >>> Bertrand Dechoux >>> >> >> > > > -- > Bertrand Dechoux >