Specifically, replicated join - http://pig.apache.org/docs/r0.10.0/perf.html#replicated-joins
On Fri, Feb 15, 2013 at 6:22 PM, David Boyd <db...@lorenzresearch.com>wrote: > Use PIG it has specific directives for in memory joins of small > data sets. The whole thing might require a half a dozen lines > of code. > > > > On 2/15/2013 4:25 PM, Yunming Zhang wrote: > >> Hi, >> >> I am trying to do some work with in memory Join Map Reduce implementation, >> >> it can be summarized as a a join between two data set, R and S, one of >> them is too large to fit into memory, the other one can fit into memory >> reasonably well, >> (size of R << size of S). The typical implementation >> 1) distributes or broadcasts R to all map tasks (each mapper loads R in >> memory, hashed by join key). >> 2) map (stream) over S, divide S into datums and use it as input to each >> map task, >> 3) within each map task, for every tuple in S, look up join key in R >> 4) reduce computation is trivial >> >> If anyone could point me to a good implementation that I could use a >> reference, that would be great. >> I do plan to write my own implementation, but it would be helpful to >> take a look to see if there are established implementation out there, >> >> Thanks >> Yunming >> > > -- > ========= mailto:dboyd@lorenzresearch.**com > <db...@lorenzresearch.com>============ > David W. Boyd > Vice President, Operations > Lorenz Research, a Data Tactics corporation > 7901 Jones Branch, Suite 610 > Mclean, VA 22102 > office: +1-703-506-3735, ext 308 > fax: +1-703-506-6703 > cell: +1-703-402-7908 > ============== http://www.lorenzresearch.com/ ============ > > > The information contained in this message may be privileged > and/or confidential and protected from disclosure. > If the reader of this message is not the intended recipient > or an employee or agent responsible for delivering this message > to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication > is strictly prohibited. If you have received this communication > in error, please notify the sender immediately by replying to > this message and deleting the material from any computer. > > >