Re: map reduce to achieve cartessian product

2009-12-16 Thread Todd Lipcon
Hi Eguzki, Is one of the tables vastly smaller than the other? If one is small enough to fit in RAM, you can do this like so: 1. Add the small file to the DistributedCache 2. In the configure() method of the mapper, read the entire file into an ArrayList or somesuch in RAM 3. Set the input path

Re: map reduce to achieve cartessian product

2009-12-16 Thread Eguzki Astiz Lezaun
Thanks Todd, That was my plan-B or workaround. Anyway, I am happy to see there is no straight way to do so I could miss. The small list is a list of userId (dim table), so I can assume it as small but that can be a limitation in the scalability of our system. I will test the upper limits.

Re: map reduce to achieve cartessian product

2009-12-16 Thread Todd Lipcon
Hi Eguzki, I wouldn't say the size of the list fitting into RAM would be the scalability bottleneck. If you're doing a full cartesian join of your users against a larger table, the fact that you're doing the full cartesian join is going to be the bottleneck first :) -Todd On Wed, Dec 16, 2009

Re: map reduce to achieve cartessian product

2009-12-16 Thread Edward Capriolo
On Wed, Dec 16, 2009 at 12:29 PM, Todd Lipcon t...@cloudera.com wrote: Hi Eguzki, I wouldn't say the size of the list fitting into RAM would be the scalability bottleneck. If you're doing a full cartesian join of your users against a larger table, the fact that you're doing the full cartesian