Hi, Can you have a look at
http://pig.apache.org/docs/r0.11.1/basic.html#cross Thanks On Thu, Apr 18, 2013 at 7:47 PM, zheyi rong <zheyi.r...@gmail.com> wrote: > Dear all, > > I am writing to kindly ask for ideas of doing cartesian product in hadoop. > Specifically, now I have two datasets, each of which contains 20million > lines. > I want to do cartesian product on these two datasets, comparing lines > pairwisely. > > The output of each comparison can be mostly filtered by a function ( we do > not output the > whole result of this cartesian product, but only a small part). > > I guess one good way is to pass one block from dataset1 and another block > from dataset2 > to a mapper, then let the mappers do the product in memory to avoid IO. > > Any suggestions? > Thank you very much. > > Regards, > Zheyi Rong >