Specifically, replicated join -
http://pig.apache.org/docs/r0.10.0/perf.html#replicated-joins
On Fri, Feb 15, 2013 at 6:22 PM, David Boyd wrote:
> Use PIG it has specific directives for in memory joins of small
> data sets. The whole thing might require a half a dozen lines
> of code.
>
>
>
> On
Use PIG it has specific directives for in memory joins of small
data sets. The whole thing might require a half a dozen lines
of code.
On 2/15/2013 4:25 PM, Yunming Zhang wrote:
Hi,
I am trying to do some work with in memory Join Map Reduce implementation,
it can be summarized as a a join be
Why not look at HIVE ? It already implements the JOIN that you are looking
for and has features to do MAPJOIN i.e. load small file into memory.
On Fri, Feb 15, 2013 at 1:25 PM, Yunming Zhang
wrote:
> Hi,
>
> I am trying to do some work with in memory Join Map Reduce implementation,
>
> it can be
Hi,
I am trying to do some work with in memory Join Map Reduce implementation,
it can be summarized as a a join between two data set, R and S, one of them is
too large to fit into memory, the other one can fit into memory reasonably
well,
(size of R << size of S). The typical implementation