Re: Setting the number of mappers to 0

2010-07-09 Thread Eric Sammer
Ravi: Currently there's no way to avoid the map stage and the sort and shuffle that comes with it. The only real option is to have an identity mapper that passes the keys / values through as you're doing now. On Fri, Jul 9, 2010 at 4:07 PM, Chinni, Ravi wrote: > I am trying to develop a MR appli

Setting the number of mappers to 0

2010-07-09 Thread Chinni, Ravi
I am trying to develop a MR application. Due to the kind of application I am trying to develop, the mapper is a dummy (passes it's input to it's output) task and I am only interested in having a partitioner and reducer. The MR framework allows us to set the number of reducers to 0. Is there a way

Re: SequenceFile as map input

2010-07-09 Thread Alex Kozlov
Hi Alan, You don't need to do this complex trickery if you write to the Sequence File. How do you create the Sequence File? In your case it might make sense to create a Sequence File where the first object is the file name or compete path and the second is the content. Then you just call: pr

Re: SequenceFile as map input

2010-07-09 Thread Alan Miller
Hi Alex, My original files are ascii text. I was using and everything worked fine. Because my files are small (>2MB on avg.) I get one-map task per file. For my test I had 2000 files, totalling 5GB and the whole run took approx 40 minutes. I read that I could improve performance by merging

java.lang.OutOfMemoryError: Java heap space

2010-07-09 Thread Shuja Rehman
Hi All I am facing a hard problem. I am running a map reduce job using streaming but it fails and it gives the following error. Caught: java.lang.OutOfMemoryError: Java heap space at Nodemapper5.parseXML(Nodemapper5.groovy:25) java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): su

Last day to submit your Surge 2010 CFP!

2010-07-09 Thread Jason Dixon
Today is your last chance to submit a CFP abstract for the 2010 Surge Scalability Conference. The event is taking place on Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies that address production failures and the re-engineering efforts that led to victory in Web Application

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-09 Thread Ted Yu
Did you check task tracker log and log from your reducer to see if anythng was wrong ? Please also capture jstack output so that we can help you diagnose. On Friday, July 9, 2010, bmdevelopment wrote: > Hi, I updated to the version here: > http://github.com/kevinweil/hadoop-lzo > > However, when

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-09 Thread bmdevelopment
Hi, I updated to the version here: http://github.com/kevinweil/hadoop-lzo However, when I use lzop for intermediate compression I am still having trouble - the reduce phase now freezes at 99% and eventually fails. No immediate problem, because I can use the default codec. But may be of concern to