Specifically, replicated join -
http://pig.apache.org/docs/r0.10.0/perf.html#replicated-joins

On Fri, Feb 15, 2013 at 6:22 PM, David Boyd <db...@lorenzresearch.com>wrote:

> Use PIG it has specific directives for in memory joins of small
> data sets.  The whole thing might require a half a dozen lines
> of code.
>
>
>
> On 2/15/2013 4:25 PM, Yunming Zhang wrote:
>
>> Hi,
>>
>> I am trying to do some work with in memory Join Map Reduce implementation,
>>
>> it can be summarized as a a join between two data set, R and S, one of
>> them is too large to fit into memory, the other one can fit into memory
>> reasonably well,
>> (size of R << size of S). The typical implementation
>> 1) distributes or broadcasts R to all map tasks (each mapper loads R in
>> memory, hashed by join key).
>> 2) map (stream) over S, divide S into datums and use it as input to each
>> map task,
>> 3) within each map task, for every tuple in S, look up join key in R
>> 4) reduce computation is trivial
>>
>> If anyone could point me to a good implementation that I could use a
>> reference, that would be great.
>> I do plan to write my own implementation, but it would be helpful to
>> take a look to see if there are established implementation out there,
>>
>> Thanks
>> Yunming
>>
>
> --
> ========= mailto:dboyd@lorenzresearch.**com 
> <db...@lorenzresearch.com>============
> David W. Boyd
> Vice President, Operations
> Lorenz Research, a Data Tactics corporation
> 7901 Jones Branch, Suite 610
> Mclean, VA 22102
> office:   +1-703-506-3735, ext 308
> fax:     +1-703-506-6703
> cell:     +1-703-402-7908
> ============== http://www.lorenzresearch.com/ ============
>
>
> The information contained in this message may be privileged
> and/or confidential and protected from disclosure.
> If the reader of this message is not the intended recipient
> or an employee or agent responsible for delivering this message
> to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication
> is strictly prohibited.  If you have received this communication
> in error, please notify the sender immediately by replying to
> this message and deleting the material from any computer.
>
>
>

Reply via email to