Re: [Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-07-01 Thread Yaozhen Pan
Narayanan, Regarding the client installation, you should make sure that client and server use same version hadoop for submitting jobs and transfer data. if you use a different user in client than the one runs hadoop job, config the hadoop ugi property (sorry i forget the exact name). 在 2011 7 1 1

Re: mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread Mostafa Gaber
If your datanode has 2 HDFS-chunks (blocks) of the input file, the scheduler will first prefer to run 2 map tasks on the tasktracker where this datanode resides. On Fri, Jul 1, 2011 at 10:33 PM, Juwei Shi wrote: > I think that Anthony is right. Task capacity has to been set at > mapred-default.

Re: mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread Juwei Shi
I think that Anthony is right. Task capacity has to been set at mapred-default.html, and restart the cluster. Anthony Urso 2011/7/2 > Are you sure? AFAIK all mapred.xxx properties can be set via job config. I > also read on yahoo tutorial that this property can be either set in > hadoop-site

Re: mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread Joey Echeverria
This property applies to a tasktraker rather that an individual job. Therefore it needs to be set in the mapred-site.xml and the daemon restarted. -Joey On Jul 1, 2011 7:01 PM, wrote: > Are you sure? AFAIK all mapred.xxx properties can be set via job config. I also read on yahoo tutorial that thi

Re: mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread praveen.peddi
Are you sure? AFAIK all mapred.xxx properties can be set via job config. I also read on yahoo tutorial that this property can be either set in hadoop-site.XML or job config. May be someone can confirm this who have really used this property. Praveen On Jul 1, 2011, at 4:46 PM, "ext Anthony Urs

Re: hadoop job is run slow in multicluster configuration

2011-07-01 Thread Laurent Hatier
Check your /etc/hosts. I've had this problem and i change the 127.0.1.1 or 127.0.0.1 into the real IP of the machine. Just try it :) 2011/7/1 Devaraj K > Can you check the logs in the task tracker machine, what is happening to > the task execution and status of the task? > > ** ** > > Devar

Re: mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread Anthony Urso
On Fri, Jul 1, 2011 at 1:03 PM, wrote: > Hi all, > > I am using hadoop 0.20.2. I am setting the property > mapred.tasktracker.map.tasks.maximum = 4 (same for reduce also) on my job > conf but I am still seeing max of only 2 map and reduce tasks on each node. > I know my machine can run 4 maps and

mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread praveen.peddi
Hi all, I am using hadoop 0.20.2. I am setting the property mapred.tasktracker.map.tasks.maximum = 4 (same for reduce also) on my job conf but I am still seeing max of only 2 map and reduce tasks on each node. I know my machine can run 4 maps and 4 reduce tasks in parallel. Is this a bug in 0.2

Re: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-01 Thread Juwei Shi
Harsh, It works. Thanks a lot!!! 2011/7/2 Harsh J > Juwei, > > Its odd that a killed job should get "recovered" back into running > state. Can you not simply disable the JT recovery feature (I believe > its turned off by default)? > > On Fri, Jul 1, 2011 at 10:47 PM, Juwei Shi wrote: > > Thank

Re: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-01 Thread Harsh J
Juwei, Its odd that a killed job should get "recovered" back into running state. Can you not simply disable the JT recovery feature (I believe its turned off by default)? On Fri, Jul 1, 2011 at 10:47 PM, Juwei Shi wrote: > Thanks Harsh. > > There are "recover" jobs after I re-boot mapreduce/hdfs

Re: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-01 Thread Juwei Shi
Thanks Harsh. There are "recover" jobs after I re-boot mapreduce/hdfs. Is there any other way to delete the status records of the running jobs? Then they will not recover after restarting JT? 2011/7/2 Harsh J > Juwei, > > Please do not cross-post to multiple lists. I believe this question > su

Re: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-01 Thread Harsh J
Juwei, Please do not cross-post to multiple lists. I believe this question suits the mapreduce-user@ list so am replying only on there. On Fri, Jul 1, 2011 at 9:22 PM, Juwei Shi wrote: > Hi, > > I faced a problem that the jobs are still running after executing "hadoop > job -kill jobId". I reboo

Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-01 Thread Juwei Shi
Hi, I faced a problem that the jobs are still running after executing "hadoop job -kill jobId". I rebooted the cluster but the job still can not be killed. The hadoop version is 0.20.2. Any idea? Thanks in advance! -- - Juwei

Re: Relation between Mapper and Combiner

2011-07-01 Thread Lucian Iordache
Ok, that is what I wanted to know. Thanks you! Best Regards, Lucian On Fri, Jul 1, 2011 at 2:47 PM, Devaraj K wrote: > Hi Lucian, > > ** ** > > For every map task, combiner will be executed multiple times > before writing the map output. Combine step is not a separate task and

RE: hadoop job is run slow in multicluster configuration

2011-07-01 Thread Devaraj K
Can you check the logs in the task tracker machine, what is happening to the task execution and status of the task? Devaraj K - This e-mail and its attachments

RE: Relation between Mapper and Combiner

2011-07-01 Thread Devaraj K
Hi Lucian, For every map task, combiner will be executed multiple times before writing the map output. Combine step is not a separate task and it is part of map task execution. Reducer will copy the output of the map task which is reduced by the combiner. >For example: >If I hav

Relation between Mapper and Combiner

2011-07-01 Thread Lucian Iordache
Hello guys, Can anybody tell me which is the relation between map task and combine tasks? I would like to know if there is a 1:1 relation between them, or is a *:1 (many to one). For example: If I have *2 map tasks* ran on the same machine, will I have *1 combine task * on that machine to combine

hadoop job is run slow in multicluster configuration

2011-07-01 Thread ranjith k
hello, My mapreduce program is slow in multi-cluster configuration. Reduce task is stuck at 16% . But the same program is running more faster in sudo-mode(single node). what can i do ??? I have only two machine. pls help me.. -- Ranjith k

Re: [Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-07-01 Thread Harsh J
Narayanan, On Fri, Jul 1, 2011 at 12:57 PM, Narayanan K wrote: > So the report will be run from a different machine outside the cluster. So > we need a way to pass on the parameters to the hadoop cluster (master) and > initiate a mapreduce job dynamically. Similarly the output of mapreduce job >

Re: [Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-07-01 Thread Narayanan K
Hi Harsh Thanks for the quick response... Have a few clarifications regarding the 1st point : Let me tell the background first.. We have actually set up a Hadoop cluster with HBase installed. We are planning to load Hbase with data and perform some computations with the data and show up the dat