Re: Java Heap memory error : Limit to 2 Gb of ShuffleRamManager ?

2012-12-10 Thread Olivier Varene - echo
here is the Jira issue, and the beginning of a patch https://issues.apache.org/jira/browse/MAPREDUCE-4866 there is indeed a limitation on the byte array size (around Integer.MAX_VALUE) Maybe we could use BigArrays to overcome this limitation ? What do you think ? regards Olivier Le 6 déc.

Re: attempt* directories in user logs

2012-12-10 Thread Tsuyoshi OZAWA
Hi Oleg, Speculative tasks can be launched as TaskAttempt in MR jobs. And, if no reducer class is set, MR launches default Reducer class(IdentityReducer). Thanks, Tsuyoshi On Sun, Dec 9, 2012 at 11:53 PM, Oleg Zhurakousky oleg.zhurakou...@gmail.com wrote: I studying user logs on the two

Reg: Map output copy failure

2012-12-10 Thread Manoj Babu
Hi All I got the below exception, Is the issue related to https://issues.apache.org/jira/browse/MAPREDUCE-1182 ? Am using CDH3U1 2012-12-10 06:22:39,688 FATAL org.apache.hadoop.mapred.Task: attempt_201211120903_9197_r_24_0 : Map output copy failure : java.lang.OutOfMemoryError: Java heap

Re: Reg: Map output copy failure

2012-12-10 Thread Manoj Babu
In the ReduceTask.java having the below code maxSize = (int)(conf.getInt(mapred.job.reduce.total.mem.bytes, (int)Math.min(Runtime.getRuntime().maxMemory(), Integer.MAX_VALUE)) * maxInMemCopyUse); But in the patch maxSize = (long)Math.min(

Re: attempt* directories in user logs

2012-12-10 Thread Vinod Kumar Vavilapalli
MR launches multiple attempts for single Task in case of TaskAttempt failures or when speculative execution is turned on. In either case, a given Task will only ever have one successful TaskAttempt whose output will be accepted (committed). Number of reduces is set to 1 by default in

Re: Stop at CallObjectMethod when daemon running

2012-12-10 Thread Vinod Kumar Vavilapalli
Not familiar with your apr stuff, but you should capture getJobStatus() method instead of getAllJobs(). getJobStatus() is what is called for individual jobs, getAllJobs() is called only when you try to list jobs. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Dec

Re: Strange machine behavior

2012-12-10 Thread Bharath Mundlapudi
Are you seeing any performance impact with this cache increase? It is normal in linux system to grab high cache level. -Bharath From: Andy Isaacson a...@cloudera.com To: user@hadoop.apache.org Sent: Monday, December 10, 2012 11:23 AM Subject: Re: Strange

Re: Reg: Map output copy failure

2012-12-10 Thread Bharath Mundlapudi
What was the job or query you were running? Couple of suggestions: 1. Reduce data set size with job chaining 2. Increase Reduce task heap 3. If you are using Hive/Pig, you may want to tune your query. -Bharath From: Manoj Babu manoj...@gmail.com To:

Re: Strange machine behavior

2012-12-10 Thread Robert Dyer
Yes there is performance impact. It should be visible from the graph I attached. Basically, the CPU is spending much more time on System and the User time is lowered. When this happens (if I don't do a drop_caches in time) the MR job winds up taking significantly longer than usual. On Mon,

Re: Strange machine behavior

2012-12-10 Thread Robert Dyer
On Sun, Dec 9, 2012 at 5:45 AM, a...@hsk.hk a...@hsk.hk wrote: Hi, I always set vm.swappiness = 0 for my hadoop servers (PostgreSQL servers too). I have just done this for that machine. So far, I have not seen a re-occurrence of the strange behavior; it appears this might have solved the

Re: attempt* directories in user logs

2012-12-10 Thread Hemanth Yamijala
However, in the case Oleg is talking about the attempts are: attempt_201212051224_0021_m_00_0 attempt_201212051224_0021_m_02_0 attempt_201212051224_0021_m_03_0 These aren't multiple attempts of a single task, are they ? They are actually different tasks. If they were multiple

What are the best practices to start Hadoop MR jobs?

2012-12-10 Thread Ivan Ryndin
Hi all! What are the best practices to start MR jobs? Curently I start my jobs by cron. Also I start jobs by internal timer from my java application (which is similar starting by cron) What are other approaches to start MR jobs? Perhaps, some best practices apply here? -- Best regards, Ivan

Re: What are the best practices to start Hadoop MR jobs?

2012-12-10 Thread Mohammad Tariq
Hello Ivan, Instead of running as a cron job, you can launch MR jobs through Apache Oozie, the workflow engine for MapReduce, Pig etc. For more details you can visit the Oozie home page at : oozie.apache.org Regards, Mohammad Tariq On Tue, Dec 11, 2012 at 11:32 AM, Ivan Ryndin