Thanks for the explanation. Now I just gotta find some time to do a
POC!
-Chuck
-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Friday, February 22, 2008 3:58 PM
To: core-user@hadoop.apache.org
Subject: Re: Calculations involve large datasets
Joins are easy
m: Ted Dunning [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 22, 2008 3:58 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Calculations involve large datasets
>
>
>
> Joins are easy.
>
> Just reduce on a key composed of the stuff you want to join on. If
the
>
Joins are easy.
Just reduce on a key composed of the stuff you want to join on. If the data
you are joining is disparate, leave some kind of hint about what kind of
record you have.
The reducer will be iterating through sets of records that have the same
key. This is similar to the results of
Have you seen PIG:
http://incubator.apache.org/pig/
It generates hadoop code and is more query like, and (as far as I
remember) includes union, join, etc.
Tim
On Fri, 2008-02-22 at 09:13 -0800, Chuck Lan wrote:
> Hi,
>
> I'm currently looking into how to better scale the performance of our
> ca
See http://incubator.apache.org/pig/. Hope that helps. Not sure how joins
could be done in Hadoop.
Amar
On Fri, 22 Feb 2008, Chuck Lan wrote:
Hi,
I'm currently looking into how to better scale the performance of our
calculations involving large sets of financial data. It is currently using
a