RE: Hadoop 2.7.2 Yarn Memory Utiliziation

2016-10-07 Thread Guttadauro, Jeff
I encountered under-utilization while moving in particular one of my MR jobs to YARN/Hadoop 2.7 a while back. While it will probably depend on what your job is doing, in my case, I think the biggest improvement came for me when I increased the split size (from its default of 128M up to 2G)

Hbase Reducer NPE

2016-10-07 Thread Christopher Piggott
This question is pertaining to hadoop with core version 1.2.1 and hbase 1.2.3. I wrote a simple map/reduce job that looks like this: - The input to the mapper is whole HDFS files at a time, via a custom InputFormat - The output of the mapper is The job is configured like

RE: how to add a shareable node label?

2016-10-07 Thread Frank Luo
That is correct, Sunil. Just to confirm, the Node Labeling feature on 2.8 or 3.0 alpha won’t satisfy my need, right? From: Sunil Govind [mailto:sunil.gov...@gmail.com] Sent: Friday, October 07, 2016 12:09 PM To: Frank Luo ; user@hadoop.apache.org Subject: Re: how to add a

could not delete hdfs://master...

2016-10-07 Thread Ziming Dong
I put it on stackoverflow. http://stackoverflow.com/questions/39913399/run-hadoop-wordcount-example-failed -- Ziming Dong *http://suiyuan2009.github.io/ *

Re: how to add a shareable node label?

2016-10-07 Thread Sunil Govind
HI Frank In that case, preemption may not be needed. So over-utilizing resources of queueB will be running till it completes. Since queueA is under served, then any next free container could go to queueA which is for Job_A. Thanks Sunil On Fri, Oct 7, 2016 at 9:58 PM Frank Luo

RE: how to add a shareable node label?

2016-10-07 Thread Frank Luo
Sunil, Your description pretty much matches my understanding. Except for “Job_A will have to run as per its schedule w/o any delay”. My situation is that Job_A can be delayed. As long as it runs in queueA, I am happy. Just as you said, processes normally running in queueB might not be

Re: how to add a shareable node label?

2016-10-07 Thread Sunil Govind
HI Frank Thanks for the details. I am not quite sure if I understood you problem correctly. I think you are looking for a solution to ensure that Job_A will have to run as per its schedule w/o any delay. Meantime you also do not want to waste resources on those high end machine where Job_A is