RE: Map job hangs indefinitely

2011-06-21 Thread Devaraj K
With this info it is difficult to find out where the problem is coming. Can you check the job tracker and task tracker logs related to these jobs? Devaraj K _ From: Sudharsan Sampath [mailto:sudha...@gmail.com] Sent: Wednesday, June 22, 2011 11:51 AM To: mapreduce-user@hadoop.apach

Map job hangs indefinitely

2011-06-21 Thread Sudharsan Sampath
Hi, I am starting a job from the map of another job. Following are quick mock of the code snippets that I use. But the 2nd job hangs indefinitely after the 1st task attempt fails. There is not even a 2nd attempt. This runs fine on a cluster with one node but fails on a two node cluster. Can someo

RE: Tasktracker denied communication with jobtracker

2011-06-21 Thread Devaraj K
Hi Virajith, This exception will be thrown when the host name present in the file which is the value for the property "mapred.hosts.exclude". If you don't mention any thing for "mapred.hosts" and "mapred.hosts.exclude" properties or the mentioned files don't contain any hosts, job trac

Re: Large startup time in remote MapReduce job

2011-06-21 Thread Gabor Makrai
Fortunately, DistributedCache solved my problem! I put a jar file to HDFS. which contains the necessary classes for the job and I used this: *DistributedCache.addFileToClassPath(new Path("/myjar/myjar.jar"), conf);* Thanks for fast answer! And sorry for my mistake (about the wrong list), that was

Re: Large startup time in remote MapReduce job

2011-06-21 Thread Harsh J
Allen, On Wed, Jun 22, 2011 at 2:28 AM, Allen Wittenauer wrote: > > On Jun 21, 2011, at 1:31 PM, Harsh J wrote: > >> Gabor, >> >> If your jar does not contain code changes that need to get transmitted >> every time, you can consider placing them on the JT/TT classpaths > >        ... which means

Re: AW: How to split a big file in HDFS by size

2011-06-21 Thread Niels Basjes
Hi, On Tue, Jun 21, 2011 at 16:14, Mapred Learn wrote: > The problem is when 1 text file goes on HDFS as 60 GB file, one mapper takes > more than an hour to convert it to sequence file and finally fails. > > I was thinking how to split it from the client box before uploading to HDFS. Have a look

RE: tasktracker maximum map tasks for a certain job

2011-06-21 Thread GOEKE, MATTHEW (AG/1000)
Off the wall thought but it might be possible to do this through rolling your own load manager using the fair scheduler. I know this is how people have setup custom job distributions based on current cluster utilization. Matt From: Jonathan Zukerman [mailto:zukermanjonat...@gmail.com] Sent: Tue

GenericWritableComparable?

2011-06-21 Thread tim likarish
Hi, In Tom White's Hadoop book, White discusses Writable collections and ends the section with the following: For lists of a single type of Writable, ArrayWritable is adequate, but to store different types of Writable in a single list, you can use GenericWritable to wrap the elements in an ArrayW

Re: tasktracker maximum map tasks for a certain job

2011-06-21 Thread Jonathan Zukerman
That is a bit problematic because I have other jobs running at the same time, most of them don't care about the number of map tasks per tasktracker. Is there a way to implement this in in my job project? What is the best way to do it? On Tue, Jun 21, 2011 at 8:08 PM, Joey Echeverria wrote: > The

Re: tasktracker maximum map tasks for a certain job

2011-06-21 Thread Joey Echeverria
The only way to do that is to drop the setting down to one and bounce the TaskTrackers. -Joey On Tue, Jun 21, 2011 at 12:52 PM, Jonathan Zukerman wrote: > Hi, > Is there a way to set the maximum map tasks for all tasktrackers in my > cluster for a certain job? > Most of my tasktrackers are confi

tasktracker maximum map tasks for a certain job

2011-06-21 Thread Jonathan Zukerman
Hi, Is there a way to set the maximum map tasks for all tasktrackers in my cluster for a certain job? Most of my tasktrackers are configured to handle 4 maps concurrently, and most of my jobs don't care where does the map function run. But small part of my jobs requires that no two map functions w

Re: When is mapred-site.xml read?

2011-06-21 Thread Alex Kozlov
*keep.failed.task.files* is also set by the client (also, HDFS block size, replication level, *io.sort.{mb,factor}*, etc.) On Tue, Jun 21, 2011 at 7:15 AM, John Armstrong wrote: > On Tue, 21 Jun 2011 06:37:50 -0700, Alex Kozlov > wrote: > > However, the job's tasks are executed in a separate JVM

Tasktracker denied communication with jobtracker

2011-06-21 Thread Virajith Jalaparti
Hi, I am trying to setup a hadoop cluster with 7nodes with the master node also functioning as a slave node (i.e. runs a datanode and a tasktracker along with the namenode and jobtracker deamons). I am able to get HDFS working. However when I try starting the tasktrackers (bin/start-mapred.sh), I

Re: When is mapred-site.xml read?

2011-06-21 Thread John Armstrong
On Tue, 21 Jun 2011 06:37:50 -0700, Alex Kozlov wrote: > However, the job's tasks are executed in a separate JVM and some > of the parameters, like max heap from *mapred.java.child.opts*, are set > during the job execution. In this case the parameter is coming from the > client side where the who

Re: When is mapred-site.xml read?

2011-06-21 Thread Alex Kozlov
Hi John, You are right: the *-site.xml files are read by daemons on startup. However, the job's tasks are executed in a separate JVM and some of the parameters, like max heap from *mapred.java.child.opts*, are set during the job execution. In this case the parameter is coming from the client side

When is mapred-site.xml read?

2011-06-21 Thread John Armstrong
One of my colleagues and I have a little confusion between us as to exactly when mapred-site.xml is read. The pages on hadoop.apache.org don't seem to specify it very clearly. One position is that mapred-site.xml is read by the daemon processes at startup, and so changing a parameter in mapred-si