Shuffle stuck at 0.22.0

2011-12-19 Thread Markus Jelsma
Hi, On 0.22.0 we sometimes see a shuffle phase being stuck to a point where the framework does not kill it because of lack of progress. The reducer's tasktracker log keeps filling up with two exceptions all night long: 2011-12-20 06:25:03,711 WARN org.mortbay.log: Committed before 410 getMapO

Re: Tasktracker Task Attempts Stuck (mapreduce.task.timeout not working)

2011-12-19 Thread Todd Lipcon
On Mon, Dec 19, 2011 at 7:29 PM, rajesh balamohan wrote: > Hi John, > > Which version of JVM are you using? ( JDK 1.6.0.2xx?) and what are the JVM > arguments you use for the spawning the map/reduce slots? > > Check if the JVM is stuck in the machine. Sometimes I have seen task JVM > just launchin

Re: Tasktracker Task Attempts Stuck (mapreduce.task.timeout not working)

2011-12-19 Thread rajesh balamohan
Hi John, Which version of JVM are you using? ( JDK 1.6.0.2xx?) and what are the JVM arguments you use for the spawning the map/reduce slots? Check if the JVM is stuck in the machine. Sometimes I have seen task JVM just launching, gets into spinning mode and occupies 100% CPU. Can you check if th

Re: Variable mapreduce.tasktracker.*.tasks.maximum per job

2011-12-19 Thread Arun C Murthy
Markus, The CapacityScheduler in 0.20.205 (in fact since 0.20.203) supports the notion of 'high memory jobs' with which you can specify, for each job, the number of 'slots' for each map/reduce. For e.g. you can say for job1 that each map needs 2 slots and so on. Unfortunately, I don't know how

Re: Are job statistic logs right in the 0.23 version of Hadoop?

2011-12-19 Thread Mahadev Konar
Moving it to mapreduce-list. Sophie, This could just be a bug a 0.23. 0.23 does not have jobtrackers/tasktrackers. Could you see if you can recreate this? If yes, please do file a jira on this. thanks mahadev On Mon, Dec 19, 2011 at 12:22 PM, Raj V wrote: > Sophie > > > Are the clocks in syn

Variable mapreduce.tasktracker.*.tasks.maximum per job

2011-12-19 Thread Markus Jelsma
Hi, We have many different jobs running on a 0.22.0 cluster, each with its own memory consumption. Some jobs can easily be run with a large amount of *.tasks per job and others require much more memory and can only be run with a minimum number of tasks per node. Is there any way to reconfigure

Re: hadoop streaming job IOException

2011-12-19 Thread Robert Evans
I cannot tell you the exact error, but it looks like your perl job crashed some how. This caused the java code that was writing to the perl stdin to get a Broken pipe error, which caused the java process to exit and report that as the real error. Usually when this happens there is a race condi

Fwd: Reduce output is strange

2011-12-19 Thread Pedro Costa
Hi, In the hadoop MapReduce, I've executed the webdatascan example, and the reduce output is in a SequeceFile. The result is shows here ( http://paste.lisp.org/display/126572). What's the trash (random characters), like "u 265 100 330 320 252 " \n # ; 374 5 211 V ' 340 376" in the output? Is t

Re: Distcp from 0.20 to 0.22 [solved]

2011-12-19 Thread Markus Jelsma
It seems the files cannot be validated for some reason. The source files are fine, not corrupt and can be read without issues. java.io.IOException: Validation of copy of file hftp://namenode01.openindex.io:50070/user/systems/segments/index/20111021161228/crawl_parse/part-00011 failed.

Distcp from 0.20 to 0.22

2011-12-19 Thread Markus Jelsma
Hi, Apologies for cross-posting. We're in the process of migrating data from an Apache Hadoop 0.20.203.0 cluster to a 0.22.0 cluster using distcp with a hftp source and hdfs dest as described in the manual. During the copy a handful of the following cryptic IOExceptions occured and the job fin