Re: Mappers crashing due to running out of heap space during initialisation

2011-04-27 Thread James Hammerton
Hi, I lowered the io.sort.mb to 100mb from 200mb and that allowed my job to get through the mapping phase, thanks Chris. However what I don't understand is why the memory got used up in the first place when the mapper only buffers the previous input and the maximum serialised size of the objects

Re: generating crc for files on hdfs

2011-04-27 Thread Harsh J
Hello Giridhar, Maybe this thread would be of interest to you: http://search-hadoop.com/m/nIPU2Ocbr41/crc+hdfs&subj=Reg+HDFS+checksum On Wed, Apr 27, 2011 at 11:57 AM, Giridhar Addepalli wrote: > HI, > > > > How to generate crc for files on hdfs? > > I copied files from hdfs to remote machine, I

Re: Mappers crashing due to running out of heap space during initialisation

2011-04-27 Thread Joey Echeverria
It was initializing a 200MB buffer to do the sorting of the output in. How much space did you allocate the task JVMs (mapred.child.java.opts in mapred-site.xml)? If you didn't change the default, it's set to 200MB which is why you would run out of error trying to allocate a 200MB buffer. -Joey O

Re: Mappers crashing due to running out of heap space during initialisation

2011-04-27 Thread James Hammerton
Thanks Joey. I guess the puzzle then is why some of my mappers used up over 312mb, leaving insufficient room out of the 512mb total we allocate when the job is no more complex than other jobs that run happily in that space. The memory usage is independent of the size of my data set, and even the l

Re: purpose of JobTracker copying job.jar

2011-04-27 Thread Arun C Murthy
So that it can instruct TTs to copy it from the system-dir for actually running jobs... hope that helps, Arun On Apr 26, 2011, at 11:57 AM, Raghu Angadi wrote: Job tracker copies job jar from mapred system directory on HDFS to its local file system ( ${mapred.local.dir}/jobTracker ). Wh