Pig has implemented map side merge joins in this way. If the storage
mechanism contains an index (e.g. Zebra) it can use it.
Alan.
On Jul 21, 2010, at 5:22 PM, Deem, Mike wrote:
We are planning to use Hadoop to run a number of recurring jobs that
involve map side joins.
Rather than requi
We are planning to use Hadoop to run a number of recurring jobs that involve
map side joins.
Rather than requiring that the joined datasets be partitioned into separate
part-* files, we are considering the following solution. Our concerns with the
partitioned approach include:
* All t