Re: How to mapreduce in the scenario

2012-05-30 Thread samir das mohapatra
Yes . Hadoop Is only for Huge Dataset Computaion . May not good for small dataset. On Wed, May 30, 2012 at 6:53 AM, liuzhg liu...@cernet.com wrote: Hi, Mike, Nitin, Devaraj, Soumya, samir, Robert Thank you all for your suggestions. Actually, I want to know if hadoop has any advantage

RE: How to mapreduce in the scenario

2012-05-30 Thread Wilson Wayne - wwilso
mohapatra [mailto:samir.help...@gmail.com] Sent: Wednesday, May 30, 2012 8:33 AM To: common-user@hadoop.apache.org Subject: Re: How to mapreduce in the scenario Yes . Hadoop Is only for Huge Dataset Computaion . May not good for small dataset. On Wed, May 30, 2012 at 6:53 AM, liuzhg liu

Re: How to mapreduce in the scenario

2012-05-29 Thread Michel Segel
Hive? Sure Assuming you mean that the id is a FK common amongst the tables... Sent from a remote device. Please excuse any typos... Mike Segel On May 29, 2012, at 5:29 AM, liuzhg liu...@cernet.com wrote: Hi, I wonder that if Hadoop can solve effectively the question as following:

Re: How to mapreduce in the scenario

2012-05-29 Thread Nitin Pawar
hive is one approach (similar to routine databases but exactly not the same) if you are looking at mapreduce program then using multipleinput formats http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html On Tue, May 29, 2012 at 4:02 PM,

RE: How to mapreduce in the scenario

2012-05-29 Thread Devaraj k
Hi Gump, Mapreduce fits well for solving these types(joins) of problem. I hope this will help you to solve the described problem.. 1. Mapoutput key and value classes : Write a map out put key class(Text.class), value class(CombinedValue.class). Here value class should be able to hold the

Re: How to mapreduce in the scenario

2012-05-29 Thread Soumya Banerjee
Hi, You can also try to use the Hadoop Reduce Side Join functionality. Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and Reduce classes to do the same. Regards, Soumya. On Tue, May 29, 2012 at 4:10 PM, Devaraj k devara...@huawei.com wrote: Hi Gump, Mapreduce fits

Re: How to mapreduce in the scenario

2012-05-29 Thread samir das mohapatra
Yes it is possible by using MultipleInputs format to multiple mapper (basically 2 different mapper) Setp: 1 MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class, *Mapper1.class*); MultipleInputs.addInputPath(conf, new Path(args[1]), TextInputFormat.class, *Mapper2.class*);

Re: How to mapreduce in the scenario

2012-05-29 Thread Robert Evans
Yes you can do it. In pig you would write something like A = load ‘a.txt’ as (id, name, age, ...) B = load ‘b.txt’ as (id, address, ...) C = JOIN A BY id, B BY id; STORE C into ‘c.txt’ Hive can do it similarly too. Or you could write your own directly in map/redcue or using the data_join jar.

Re: How to mapreduce in the scenario

2012-05-29 Thread liuzhg
Hi, Mike, Nitin, Devaraj, Soumya, samir, Robert Thank you all for your suggestions. Actually, I want to know if hadoop has any advantage than routine database in performance for solving this kind of problem ( join data ). Best Regards, Gump On Tue, May 29, 2012 at 6:53 PM, Soumya

Re: How to mapreduce in the scenario

2012-05-29 Thread Nitin Pawar
if you have huge dataset (huge meaning that around tera bytes or at the least few GBs) then yes, hadoop has the advantage of distributed systems and is much faster but on a smaller set of records it is not as good as RDBMS On Wed, May 30, 2012 at 6:53 AM, liuzhg liu...@cernet.com wrote: Hi,