Re: JVM reuse in Map Tasks

2012-06-04 Thread GUOJUN Zhu
For setup(), do you mean configure(JobConf)? We need to deserialize a big object and do some other preparing work on it within the configure() for setting up. It takes a few seconds and it is the same for all task. We just declare the object as static and do not recreate it if it is not nul

Re: JVM reuse in Map Tasks

2012-06-04 Thread Subroto
Hi Arpit, A point to mention from http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/: If each task takes less than 30-40 seconds, reduce the number of tasks. The task setup and scheduling overhead is a few seconds, so if tasks finish very quickly, you’re wasting t

JVM reuse in Map Tasks

2012-06-04 Thread Arpit Wanchoo
Hi I wanted to check what exactly we gain when JVM reusability is enabled in mapped job. My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ? If yes then is there any way I can control it or stop from being calle

MapReduce combiner issue : EOFException while reading Value

2012-06-04 Thread Arpit Wanchoo
Hi I have been trying to setup a map reduce job with hadoop 0.20.203.1. Scenario : My mapper is writing key value pairs where I have total 13 types of keys and corresponding value classes. For each input record I write all these i.e 13 key-val pair to context. My combiner and reducer are doing