+hive-users
On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia <mchett...@rocketfuelinc.com > wrote: > Hi, > > I am testing SMB join for 2 large tables. The tables are bucketed and > sorted on the join column. I notice that even though the table is large, > Hive attempts to generate hash table for the 'small' table locally, > similar to map join. Since the table is large in my case, the client runs > out of memory and the query fails. > > I am using Hive 0.12 with the following settings: > > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > > My test query does a simple join and a select, no subqueries/nested > queries etc. > > I understand why a (bucket) map join requires hash table generation, but > why is that included for an SMB join? Shouldn't a SMB join just spin up one > mapper for each bucket and perform a sort merge join directly on the mapper? > > > Thanks, > pala > > > >