Re: Block not found during commitBlockSynchronization

2008-12-05 Thread Brian Bockelman
This is 0.19.0. Grepping around, it appears that message for this block has been printed 1-5 Hz throughout all our logs (oldest logs are 12-3). Has happened about .5 million times. If I grep for the "nextGenerationStamp" error message, it's happened .4M times. Anything else I can provi

Re: Block not found during commitBlockSynchronization

2008-12-05 Thread Tsz Wo (Nicholas), Sze
Which version are you using? Calling commitBlockSynchronization(...) with newgenerationstamp=0, newlength=0, newtargets=[] does not look normal. You may check the namenode log and the client log about the block blk_-4236881263392665762. Nicholas Sze - Original Message > From: Bria

Block not found during commitBlockSynchronization

2008-12-05 Thread Brian Bockelman
Hey, I'm seeing this message repeated over and over in my logs: 2008-12-05 19:20:00,534 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_-4236881263392665762_88597, newgenerationstamp=0, newlength=0, newtargets=[]) 2008-12-05 19:20:00,534 I

Re: Issues with V0.19 upgrade

2008-12-05 Thread Michael Bieniosek
Not sure if anyone else answered... 1. You need to run hadoop dfsadmin -finalizeUpgrade. Be careful, because you can't go back once you do this. http://wiki.apache.org/hadoop/Hadoop_Upgrade I don't know about 2. -Michael On 12/3/08 5:49 PM, "Songting Chen" <[EMAIL PROTECTED]> wrote: 1. The

File loss at Nebraska

2008-12-05 Thread Brian Bockelman
We are continuing to see a small, consistent amount of block corruption leading to file loss. We have been upgrading our cluster lately, which means we've been doing a rolling de-commissioning of our nodes (and then adding them back with more disks!). Previously, when I've had time to inve

Re: slow shuffle

2008-12-05 Thread Songting Chen
To summarize the slow shuffle issue: 1. I think one problem is that the Reducer starts very late in the process, slowing the entire job significantly. Is there a way to let reducer start earlier? 2. Copying 300 files with 30K each took total 3 mins (after all map finished). This really puzz

Re: slow shuffle

2008-12-05 Thread Songting Chen
I think one of the issues is that the Reducer starts very late in the process, slowing the entire job significantly. Is there a way to let reducer start earlier? --- On Fri, 12/5/08, Songting Chen <[EMAIL PROTECTED]> wrote: > From: Songting Chen <[EMAIL PROTECTED]> > Subject: Re: slow shuffle

Re: slow shuffle

2008-12-05 Thread Songting Chen
We have 4 testing data nodes with 3 reduce tasks. The parallel.copies parameter has been increased to 20,30, even 50. But it doesn't really help... --- On Fri, 12/5/08, Aaron Kimball <[EMAIL PROTECTED]> wrote: > From: Aaron Kimball <[EMAIL PROTECTED]> > Subject: Re: slow shuffle > To: core-user

Re: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-05 Thread John Heidemann
On Thu, 04 Dec 2008 09:55:35 PST, "Alex Loddengaard" wrote: >Currently in Hadoop you cannot split bzip2 files: > > > >However, gzip files can be split: > > > >Hope this helps. Just to clarify, gzip

Re: getting Configuration object in mapper

2008-12-05 Thread Sagar Naik
check : mapred.task.is.map Craig Macdonald wrote: I have a related question - I have a class which is both mapper and reducer. How can I tell in configure() if the current task is map or a reduce task? Parse the taskid? C Owen O'Malley wrote: On Dec 4, 2008, at 9:19 PM, abhinit wrote: I

Re: getting Configuration object in mapper

2008-12-05 Thread Craig Macdonald
I have a related question - I have a class which is both mapper and reducer. How can I tell in configure() if the current task is map or a reduce task? Parse the taskid? C Owen O'Malley wrote: On Dec 4, 2008, at 9:19 PM, abhinit wrote: I have set some variable using the JobConf object. jo

Re: slow shuffle

2008-12-05 Thread Aaron Kimball
How many reduce tasks do you have? Look into increasing mapred.reduce.parallel.copies from the default of 5 to something more like 20 or 30. - Aaron On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen <[EMAIL PROTECTED]>wrote: > A little more information: > > We optimized our Map process quite a bit

Re: slow shuffle

2008-12-05 Thread Songting Chen
A little more information: We optimized our Map process quite a bit that now the Shuffle becomes the bottleneck. 1. There are 300 Map jobs (128M size block), each takes about 13 sec. 2. The Reducer starts running at a very late stage (80% maps are done) 3. copy 300 map outputs (shuffle) takes as

Re: getting Configuration object in mapper

2008-12-05 Thread Owen O'Malley
On Dec 4, 2008, at 9:19 PM, abhinit wrote: I have set some variable using the JobConf object. jobConf.set("Operator", operator) etc. How can I get an instance of Configuration object/ JobConf object inside a map method so that I can retrieve these variables. In your Mapper class, impleme

Re: stack trace from hung task

2008-12-05 Thread Ryan LeCompte
For what it's worth, I started seeing these when I upgraded to 0.19. I was using 10 reduces, but changed it to 30 reduces for my job and now I don't see these errors any more. Thanks, Ryan On Fri, Dec 5, 2008 at 2:44 PM, Sriram Rao <[EMAIL PROTECTED]> wrote: > Hi, > > When a task tracker kills a

Re: slow shuffle

2008-12-05 Thread Songting Chen
it takes 50% of the total time. --- On Fri, 12/5/08, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > From: Alex Loddengaard <[EMAIL PROTECTED]> > Subject: Re: slow shuffle > To: core-user@hadoop.apache.org > Date: Friday, December 5, 2008, 11:43 AM > These configuration options will be useful: >

stack trace from hung task

2008-12-05 Thread Sriram Rao
Hi, When a task tracker kills a non-responsive task, it prints out a message "Task X not reported status for 600 seconds. Killing!". The stack trace it then dumps out is that of the task tracker itself. Is there a way to get the hung task to dump out its stack trace before exiting? Would be n

Re: slow shuffle

2008-12-05 Thread Alex Loddengaard
These configuration options will be useful: > mapred.job.shuffle.merge.percent > 0.66 > The usage threshold at which an in-memory merge will be > initiated, expressed as a percentage of the total memory allocated to > storing in-memory map outputs, as defined by > mapred.job.shuffle.i

getting Configuration object in mapper

2008-12-05 Thread abhinit
I have set some variable using the JobConf object. jobConf.set("Operator", operator) etc. How can I get an instance of Configuration object/ JobConf object inside a map method so that I can retrieve these variables. Thanks -Abhinit

slow shuffle

2008-12-05 Thread Songting Chen
We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -S

Re: JobTracker Faiing to respond with OutOfMemory error

2008-12-05 Thread charles du
I found the following error message in hadoop-middleware-jobtracker-dd-9c32d01.off.tn.ask.com.out Java HotSpot(TM) Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated On Fri, Dec 5, 2008 at 10:58 AM,

Re: JobTracker Faiing to respond with OutOfMemory error

2008-12-05 Thread charles du
Any update on this? We got a similar problem after we ran a hadoop job with a lot of mappers. Restarting jobtracker solved the problem for a few times. But right now, we got the out of memory error right after we restarted the jobtracker. Thanks. On Wed, Nov 19, 2008 at 8:40 PM, Palleti, Pall