Re: Cartesian product in hadoop

2013-04-19 Thread zheyi rong
> > > On Thu, Apr 18, 2013 at 9:47 AM, zheyi rong wrote: > >> Dear all, >> >> I am writing to kindly ask for ideas of doing cartesian product in hadoop. >> Specifically, now I have two datasets, each of which contains 20million >> lines. >> I wan

Re: Cartesian product in hadoop

2013-04-19 Thread zheyi rong
hen output. >>> >>> >>> Note -- You may not know keys (K1, K2, … , Km) before hand. If yes, >>> then you need one more pass of dataset1 to identify the keys and store it >>> to use for dataset2. >>> >>> >>> Regards, >>> Ajay S

Re: Cartesian product in hadoop

2013-04-18 Thread Ted Dunning
reasonable one, but it will be much slower than methods where you can prune the comparisons. On Thu, Apr 18, 2013 at 9:47 AM, zheyi rong wrote: > Dear all, > > I am writing to kindly ask for ideas of doing cartesian product in hadoop. > Specifically, now I have two datasets, each of wh

Re: Cartesian product in hadoop

2013-04-18 Thread Ajay Srivastava
@gmail.com>> wrote: Dear all, I am writing to kindly ask for ideas of doing cartesian product in hadoop. Specifically, now I have two datasets, each of which contains 20million lines. I want to do cartesian product on these two datasets, comparing lines pairwisely. The output of each comparis

Re: Cartesian product in hadoop

2013-04-18 Thread zheyi rong
> >> >> On 18-Apr-2013, at 3:51 PM, Azuryy Yu wrote: >> >> This is not suitable for his large dataset. >> >> --Send from my Sony mobile. >> On Apr 18, 2013 5:58 PM, "Jagat Singh" wrote: >> >>> Hi, >>> >>>

Re: Cartesian product in hadoop

2013-04-18 Thread Ajay Srivastava
obile. On Apr 18, 2013 5:58 PM, "Jagat Singh" mailto:jagatsi...@gmail.com>> wrote: Hi, Can you have a look at http://pig.apache.org/docs/r0.11.1/basic.html#cross Thanks On Thu, Apr 18, 2013 at 7:47 PM, zheyi rong mailto:zheyi.r...@gmail.com>> wrote: Dear all, I am writing

Re: Cartesian product in hadoop

2013-04-18 Thread zheyi rong
: > >> Hi, >> >> Can you have a look at >> >> http://pig.apache.org/docs/r0.11.1/basic.html#cross >> >> Thanks >> >> >> On Thu, Apr 18, 2013 at 7:47 PM, zheyi rong wrote: >> >>> Dear all, >>> >>

Re: Cartesian product in hadoop

2013-04-18 Thread Ajay Srivastava
uot;Jagat Singh" mailto:jagatsi...@gmail.com>> wrote: Hi, Can you have a look at http://pig.apache.org/docs/r0.11.1/basic.html#cross Thanks On Thu, Apr 18, 2013 at 7:47 PM, zheyi rong mailto:zheyi.r...@gmail.com>> wrote: Dear all, I am writing to kindly ask for ideas of d

Re: Cartesian product in hadoop

2013-04-18 Thread Azuryy Yu
:47 PM, zheyi rong wrote: > >> Dear all, >> >> I am writing to kindly ask for ideas of doing cartesian product in hadoop. >> Specifically, now I have two datasets, each of which contains 20million >> lines. >> I want to do cartesian product on these two datasets, c

Re: Cartesian product in hadoop

2013-04-18 Thread Jagat Singh
Hi, Can you have a look at http://pig.apache.org/docs/r0.11.1/basic.html#cross Thanks On Thu, Apr 18, 2013 at 7:47 PM, zheyi rong wrote: > Dear all, > > I am writing to kindly ask for ideas of doing cartesian product in hadoop. > Specifically, now I have two datasets, each of wh

Cartesian product in hadoop

2013-04-18 Thread zheyi rong
Dear all, I am writing to kindly ask for ideas of doing cartesian product in hadoop. Specifically, now I have two datasets, each of which contains 20million lines. I want to do cartesian product on these two datasets, comparing lines pairwisely. The output of each comparison can be mostly