Hi Stephen In addition to join optimization, bucketing helps much in sampling as well. It helps you to choose the sample space, (ie n buckets of m).
Regards Bejoy KS Sent from remote device, Please excuse typos -----Original Message----- From: Stephen Boesch <java...@gmail.com> Date: Sun, 16 Jun 2013 11:20:49 To: <user@hive.apache.org> Reply-To: user@hive.apache.org Subject: When to use bucketed tables with/instead of partitioned tables I am accustomed to using partitioned tables to obtain separate directories for data files in each partition. When looking at the documentation for bucketed tables it seems they are typically used in conjunction with distribute by/sort by and an appropriate partitioning key - and thus provide ability to do map side joins. An explanation of when to use bucketed tables by themselves (in lieu of partitioned tables) as well as in conjunction with partitoined tables would be appreciated. thanks! stephenb