Hi Eguzki,
Is one of the tables vastly smaller than the other? If one is small enough
to fit in RAM, you can do this like so:
1. Add the small file to the DistributedCache
2. In the configure() method of the mapper, read the entire file into an
ArrayList or somesuch in RAM
3. Set the input path
Thanks Todd,
That was my plan-B or workaround. Anyway, I am happy to see there is no
straight way to do so I could miss.
The small list is a list of userId (dim table), so I can assume it as
small but that can be a limitation in the scalability of our system. I
will test the upper limits.
Hi Eguzki,
I wouldn't say the size of the list fitting into RAM would be the
scalability bottleneck. If you're doing a full cartesian join of your users
against a larger table, the fact that you're doing the full cartesian join
is going to be the bottleneck first :)
-Todd
On Wed, Dec 16, 2009
On Wed, Dec 16, 2009 at 12:29 PM, Todd Lipcon t...@cloudera.com wrote:
Hi Eguzki,
I wouldn't say the size of the list fitting into RAM would be the
scalability bottleneck. If you're doing a full cartesian join of your users
against a larger table, the fact that you're doing the full cartesian