custom Configuration values

2010-06-11 Thread Torsten Curdt
I am setting some custom values on my job configuration: Configuration conf = new Configuration(); conf.set(job.time.from, time_from); conf.set(job.time.until, time_until); Cluster cluster = new

Re: Multithreaded Mapper and Map runner

2010-06-11 Thread Aaron Kimball
This will likely break most programs you try to run. Many mapper implementations are not thread safe. That having been said, if you want to force all programs using the old API (org.apache.hadoop.mapred.*) to run on the multithreaded maprunner, you can do this by setting mapred.map.runner.class

Mixing streaming and regular map reduct jobs

2010-06-11 Thread Steve Lewis
I have a problem where I am using Java and the hadoop APIS to run a map reduce job on data that can be considered as a set of lines of text. At the reduce stage I have a collection of lines of text to process in a convenient order. There are a number of programs written in Python or Perl which can

Out of Memory during Reduce Merge

2010-06-11 Thread Ruben Quintero
Hi all, We have a MapReduce job writing a Lucene index (modeled closely after the example in contrib), and we keep hitting out of memory exceptions in the reduce phase once the number of files grows large. Here are the relevant non-default values in our mapred-site.xml: