Re: Pregel

2009-09-04 Thread Ted Dunning
Are there any production applications that use Hama? On Thu, Sep 3, 2009 at 7:07 PM, Edward J. Yoon edwardy...@apache.orgwrote: Just FYI, Hama (Hadoop Matrix, http://incubator.apache.org/hama) also consider adopting this computing model based on bulk synchronous parallel. On Fri, Sep 4,

Some issues!

2009-09-04 Thread Sugandha Naolekar
Hello! Running a simple MR job, and setting a replication factor of 2. Now, after its execution, the output is split in files named as part-0 and so on. I want to ask is, can't we avoid these keys or key values to get printed in output files? I mean, I am getting the output in the

Re: Some issues!

2009-09-04 Thread zhang jianfeng
Hi Sugandha , If you only want to the value, you need to set the key as NullWritable in reduce. e.g. output.collect(NullWritable.get(), value); On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar sugandha@gmail.comwrote: Hello! Running a simple MR job, and setting a replication

Re: Some issues!

2009-09-04 Thread Amandeep Khurana
Or you can output the data in the keys and NullWritable as the value. That ways you'll get only unique data... On 9/4/09, zhang jianfeng zjf...@gmail.com wrote: Hi Sugandha , If you only want to the value, you need to set the key as NullWritable in reduce. e.g.

RE: multi core nodes

2009-09-04 Thread Amogh Vasekar
Before setting the task limits, do take into account the memory considerations ( many archive posts on this can be found ). Also, your tasktracker and datanode daemons will run on that machine as well, so you might want to set aside some processing power for that. Cheers! Amogh -Original

Re: Issues with performance on Hadoop/Hive

2009-09-04 Thread Brian Bockelman
On Sep 3, 2009, at 11:53 PM, Ramiya V wrote: Hi, Thanks Amandeep and Ashish! @Ashish: I have set the hive.metastore.warehouse.dir parameter as / home/hive/warehouse. This warehouse directory is on the local filesystem. So will the tables now get stored on the local filesystem or HDFS? I

RE: Some issues!

2009-09-04 Thread Amogh Vasekar
Have a look at jobclient, it should suffice. Cheers! Amogh -Original Message- From: bharath vissapragada [mailto:bharathvissapragada1...@gmail.com] Sent: Friday, September 04, 2009 9:15 PM To: common-user@hadoop.apache.org Subject: Re: Some issues! Hey , I have one more doubt ,

RE: Issues with performance on Hadoop/Hive

2009-09-04 Thread Ashish Thusoo
Hi Ramya, Yes you have to explicitly give the hdfs path, so Hdfs://namnode:port/home/hive/warehouse in case you want to keep the same path in hdfs should work. Ashish -Original Message- From: Brian Bockelman [mailto:bbock...@cse.unl.edu] Sent: Friday, September 04, 2009 5:43 AM To:

How To Run Multiple Map Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
Dear All, I am using Hadoop 0.20.0. I have an application that needs to run map-reduce functions iteratively. Right now, the way I am doing this is new a Job for each pass of the map-reduce. That seems cost a lot. Is there any way to run map-reduce functions iteratively in one Job? Thanks a lot

Re: How To Run Multiple Map Reduce Functions In One Job

2009-09-04 Thread Amandeep Khurana
Wait.. Why are you using the same mapper and reducer and calling it 10 times? Is the output of the first iteration being input into the second one? What are these jobs doing? Tell a bit more about that. There might be a way by which you can club some jobs together into one job and reduce the

Re: How To Run Multiple Map Reduce Functions In One Job

2009-09-04 Thread Amandeep Khurana
You can create different mapper and reducer classes and create separate job configs for them. You can pass these different configs to the Tool object in the same parent class... But they will essentially be different jobs being called together from inside the same java parent class. Why do you

Re: How To Run Multiple Map Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
Dear Amandeep, Thanks for the fast reply. I will try the method you mentioned. In my understanding, when a job is submitted, there will be a separate java process in jobtracker responsible for that job. And there will be an initialization and cleanup cost for each job. If every iteration is a

Re: How To Run Multiple Map Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
Yes, the output of the first iteration is the input of the second iteration. Actually, I am trying the page ranking problem. In the algorithm, you have to run several iterations each using the output of previous iteration as input and producing the output for latter. It is not a real life

Re: How To Run Multiple Map Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
OK. Thank you very much! Helps me a lot, I will try it. Boyu On Fri, Sep 4, 2009 at 3:25 PM, Amandeep Khurana ama...@gmail.com wrote: Ah ok.. Then I think you'll have to fire separate jobs. But they can all be fired from inside one parent job - the method I explained earlier. Try that out...

Copying directories out of HDFS

2009-09-04 Thread Kris Jirapinyo
Hi all, What is the best way to copy directories from HDFS to local disk in 0.19.1? Thanks, Kris.

Re: Some issues!

2009-09-04 Thread bharath vissapragada
Amogh , thanks for yout reply. I will make my question more clear , Suppose I have an array and it got updated in the MRjob1 . and i want to access it in MRjob2 . This is what i intended in my previous question . I have gone through the JobConf class , but i haven't found anything useful . If

Re: Copying directories out of HDFS

2009-09-04 Thread Arvind Sharma
You mean programmatically or command line ? Command line : bin/hadoop -get /path/to/dfs/dir /path/to/local/dir Arvind From: Kris Jirapinyo kjirapi...@biz360.com To: common-user common-user@hadoop.apache.org Sent: Friday, September 4, 2009 5:15:00 PM

Re: Copying directories out of HDFS

2009-09-04 Thread Jeff Zhang
Hi Arvind, You miss the fs The command should be: bin/hadoop fs -get /path/to/dfs/dir /path/to/local/dir or bin/hadoop fs -copyToLocal /path/to/dfs/dir /path/to/local/dir The is the link of shell command for your reference. http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html On