RE: Calculations involve large datasets

2008-02-25 Thread Chuck Lan
Thanks for the explanation. Now I just gotta find some time to do a POC! -Chuck -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Friday, February 22, 2008 3:58 PM To: core-user@hadoop.apache.org Subject: Re: Calculations involve large datasets Joins are easy

RE: Calculations involve large datasets

2008-02-22 Thread Runping Qi
m: Ted Dunning [mailto:[EMAIL PROTECTED] > Sent: Friday, February 22, 2008 3:58 PM > To: core-user@hadoop.apache.org > Subject: Re: Calculations involve large datasets > > > > Joins are easy. > > Just reduce on a key composed of the stuff you want to join on. If the >

Re: Calculations involve large datasets

2008-02-22 Thread Ted Dunning
Joins are easy. Just reduce on a key composed of the stuff you want to join on. If the data you are joining is disparate, leave some kind of hint about what kind of record you have. The reducer will be iterating through sets of records that have the same key. This is similar to the results of

Re: Calculations involve large datasets

2008-02-22 Thread Tim Wintle
Have you seen PIG: http://incubator.apache.org/pig/ It generates hadoop code and is more query like, and (as far as I remember) includes union, join, etc. Tim On Fri, 2008-02-22 at 09:13 -0800, Chuck Lan wrote: > Hi, > > I'm currently looking into how to better scale the performance of our > ca

Re: Calculations involve large datasets

2008-02-22 Thread Amar Kamat
See http://incubator.apache.org/pig/. Hope that helps. Not sure how joins could be done in Hadoop. Amar On Fri, 22 Feb 2008, Chuck Lan wrote: Hi, I'm currently looking into how to better scale the performance of our calculations involving large sets of financial data. It is currently using a