Re: YARN cluster underutilization

Shubh hadoopExp Sat, 28 May 2016 01:20:41 -0700

Hey 

Thats pretty good. So by changing the file split size, the number of Maps 
running reduced??


-Shubh


> On May 27, 2016, at 4:01 PM, Guttadauro, Jeff <jeff.guttada...@here.com> 
> wrote:
> 
> Hi, all.
>  
> Just wanted to provide an update, which is that I’m finally getting good YARN 
> cluster utilization (consistently within the 90-100% range!).  I believe the 
> biggest change was to increase the min split size.  Since our input is all in 
> S3 and data locality is not really an issue, I bumped it up to 2G to minimize 
> the impact of allocation/deallocation of container resources, since each 
> container will be up working for longer, so that now occurs less frequently. 
>  
>   
> <property><name>mapreduce.input.fileinputformat.split.minsize</name><value>2147483648</value><!--
>  2G --></property>
>  
> Not sure how much impact the following changes had, since they were made at 
> the same time.  Everything’s humming along now though, so I’m going to leave 
> them. 
>  
> I also reduced the node heartbeat interval from 1000ms down to 500ms 
> ("yarn.resourcemanager.nodemanagers.heartbeat-interval-ms": "500" in cluster 
> configuration JSON), since I’m told that NodeManager will only allocate 1 
> container per node per heartbeat when dealing with non-localized data, like 
> we are since it’s in S3.  I also doubled the memory given to the YARN 
> Resource Manager from the default for the m3.xlarge node type I’m using 
> ("YARN_RESOURCEMANAGER_HEAPSIZE": "5120" in cluster configuration JSON).
>  
> Thanks again to Sunil and Shubh (and my colleague, York) for the helpful 
> guidance!
>  
> Take care,
> -Jeff 
>  
> From: Shubh hadoopExp [mailto:shubhhadoop...@gmail.com] 
> Sent: Wednesday, May 25, 2016 11:08 PM
> To: Guttadauro, Jeff <jeff.guttada...@here.com>
> Cc: Sunil Govind <sunil.gov...@gmail.com>; user@hadoop.apache.org
> Subject: Re: YARN cluster underutilization
>  
> Hey,
>  
> OFFSWITCH allocation means if the data locality is maintained or not. It has 
> no relation with heartbeat! Heartbeat is just used to clear the pipelining of 
> Container request.
>  
> -Shubh
>  
>  
> On May 25, 2016, at 3:30 PM, Guttadauro, Jeff <jeff.guttada...@here.com 
> <mailto:jeff.guttada...@here.com>> wrote:
>  
> Interesting stuff!  I did not know about this handling of OFFSWITCH requests. 
>  
> To get around this, would you recommend reducing the heartbeat interval, 
> perhaps to 250ms to get a 4x improvement in container allocation rate (or is 
> it not quite as simple as that)?  Maybe doing this in combination with using 
> a greater number of smaller nodes would help?  Would overloading the 
> ResourceManager be a concern if doing that?  Should I bump up the 
> “YARN_RESOURCEMANAGER_HEAPSIZE” configuration property (current default for 
> m3.xlarge is 2396M), or would you suggest any other knobs to turn to help RM 
> handle it?
>  
> Thanks again for all your help, Sunil!
>  
> From: Sunil Govind [mailto:sunil.gov...@gmail.com 
> <mailto:sunil.gov...@gmail.com>] 
> Sent: Wednesday, May 25, 2016 1:07 PM
> To: Guttadauro, Jeff <jeff.guttada...@here.com 
> <mailto:jeff.guttada...@here.com>>; user@hadoop.apache.org 
> <mailto:user@hadoop.apache.org>
> Subject: Re: YARN cluster underutilization
>  
> Hi Jeff,
>  
>  I do see the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms 
> property set to 1000 in the job configuration
> >> Ok, This make sense.. node heartbeat seems default.
>  
> If there are no locality specified in resource requests (using 
> ResourceRequest.ANY) , then YARN will allocate only one container per node 
> heartbeat. So your container allocation rate is slower considering 600k 
> requests and only 20 nodes. And if more number of containers are also getting 
> released fast (I could see that some containers lifetime is 80 to 90 secs), 
> then this will become more complex and container allocation rate will be 
> slower.
>  
> YARN-4963 <https://issues.apache.org/jira/browse/YARN-4963> is trying to make 
> more allocation per heartbeat for NODE_OFFSWITCH (ANY) requests. But its not 
> yet available in any release.
>  
> I guess you can investigate more in this line to confirm this points. 
>  
> Thanks
> Sunil
>  
>  
> On Wed, May 25, 2016 at 11:00 PM Guttadauro, Jeff <jeff.guttada...@here.com 
> <mailto:jeff.guttada...@here.com>> wrote:
> Thanks for digging into the log, Sunil, and making some interesting 
> observations!
>  
> The heartbeat interval hasn’t been changed from its default, and I do see the 
> yarn.resourcemanager.nodemanagers.heartbeat-interval-ms property set to 1000 
> in the job configuration.  I was searching in the log for heartbeat interval 
> information, but I didn’t find anything.  Where do you look in the log for 
> the heartbeats?
>  
> Also, you are correct about there being no data locality, as all the input 
> data is in S3.  The utilization has been fluctuating, but I can’t really see 
> a pattern or tell why.  It actually started out pretty low in the 20-30% 
> range and then managed to get up into the 50-70% range after a while, but 
> that was short-lived, as it went back down into the 20-30% range for quite a 
> while.  While writing this, I saw it surprisingly hit 80%!!  First time I’ve 
> seen it that high in the 20 hours it’s been running…  Although looks like it 
> may be headed back down.  I’m perplexed.  Wouldn’t you generally expect 
> fairly stable utilization over the course of the job?  (This is the only job 
> running.)
>  
> Thanks,
> -Jeff
>  
> From: Sunil Govind [mailto:sunil.gov...@gmail.com 
> <mailto:sunil.gov...@gmail.com>] 
> Sent: Wednesday, May 25, 2016 11:55 AM
> 
> To: Guttadauro, Jeff <jeff.guttada...@here.com 
> <mailto:jeff.guttada...@here.com>>; user@hadoop.apache.org 
> <mailto:user@hadoop.apache.org>
> Subject: Re: YARN cluster underutilization
>  
> Hi Jeff.
>  
> Thanks for sharing this information. I have some observations from this logs.
>  
> - I think the node heartbeat is around 2/3 seconds here. Is it changed due to 
> some other reasons?
> - And all mappers Resource Request seems to be asking for type ANY (there is 
> no data locality). pls correct me if I am wrong.
>  
> If the resource request type is ANY, only one container will be allocated per 
> heartbeat for a node. Here node heartbeat delay is also more. And I can see 
> that containers are released very fast too. So when u started you 
> application, are you seeing more better resource utilization? And once 
> containers started to get released/completed, you are seeing under 
> utilization. 
>  
> Pls look into this line. It may be a reason.
>  
> Thanks
> Sunil
>  
> On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff <jeff.guttada...@here.com 
> <mailto:jeff.guttada...@here.com>> wrote:
> Thanks for your thoughts thus far, Sunil.  Most grateful for any additional 
> help you or others can offer.  To answer your questions,
>  
> 1.       This is a custom M/R job, which uses mappers only (no reduce phase) 
> to process GPS probe data and filter based on inclusion within a provided 
> polygon.  There is actually a lot of upfront work done in the driver to make 
> that task as simple as can be (identifies a list of tiles that are completely 
> inside the polygon and those that fall across an edge, for which more 
> processing would be needed), but the job would still be more 
> compute-intensive than wordcount, for example.
>  
> 2.       I’m running almost 84k mappers for this job.  This is actually down 
> from ~600k mappers, since one other thing I’ve done is increased the 
> mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the 
> job.  Data is in S3, so loss of locality isn’t really a concern.
>  
> 3.       For NodeManager configuration, I’m using EMR’s default configuration 
> for the m3.xlarge instance type, which is 
> yarn.scheduler.minimum-allocation-mb=32, 
> yarn.scheduler.maximum-allocation-mb=11520, and 
> yarn.nodemanager.resource.memory-mb=11520.  YARN dashboard shows min/max 
> allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>.
>  
> 4.       Capacity Scheduler [MEMORY]
>  
> 5.       I’ve attached 2500 lines from the RM log.  Happy to grab more, but 
> they are pretty big, and I thought that might be sufficient.
>  
> Any guidance is much appreciated!
> -Jeff
>  
> From: Sunil Govind [mailto:sunil.gov...@gmail.com 
> <mailto:sunil.gov...@gmail.com>] 
> Sent: Wednesday, May 25, 2016 10:55 AM
> To: Guttadauro, Jeff <jeff.guttada...@here.com 
> <mailto:jeff.guttada...@here.com>>; user@hadoop.apache.org 
> <mailto:user@hadoop.apache.org>
> Subject: Re: YARN cluster underutilization
>  
> Hi Jeff,
>  
> It looks like to you are allocating more memory for AM container. Mostly you 
> might not need 6Gb (as per the log). Could you please help  to provide some 
> more information.
>  
> 1. What type of mapreduce application (wordcount etc) are you running? Some 
> AMs may be CPU intensive and some may not be. So based on the type 
> application, memory/cpu can be tuned for better utilization.
> 2. How many mappers (reducers) are you trying to run here? 
> 3. You have mentioned that each node has 8 cores and 15GB, but how much is 
> actually configured for NM?
> 4. Which scheduler are you using?
> 5. Its better to attach RM log if possible.
>  
> Thanks
> Sunil
>  
> On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff <jeff.guttada...@here.com 
> <mailto:jeff.guttada...@here.com>> wrote:
> Hi, all.
>  
> I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN cluster 
> that is being quite underutilized (utilization of around 25-30%).  The EMR 
> cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores each and 
> 15G total memory (with 11.25G of that available to YARN).  I’ve configured 
> mapper memory with the following properties, which should allow for 8 
> containers running map tasks per node:
>  
> <property><name>mapreduce.map.memory.mb</name><value>1440</value></property>  
>  <!-- Container size -->
> <property><name>mapreduce.map.java.opts</name><value>-Xmx1024m</value></property>
>   <!-- JVM arguments for a Map task -->
>  
> It was suggested that perhaps my AppMaster was having trouble keeping up with 
> creating all the mapper containers and that I bulk up its resource 
> allocation.  So I did, as shown below, providing it 6G container memory (5G 
> task memory), 3 cores, and 60 task listener threads.
>  
> <property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>60</value></property>
>   <!-- App Master task listener threads -->
> <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>3</value></property>
>   <!-- App Master container vcores -->
> <property><name>yarn.app.mapreduce.am.resource.mb</name><value>6400</value></property>
>   <!-- App Master container size -->
> <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx5120m</value></property>
>   <!-- JVM arguments for each Application Master -->
>  
> Taking a look at the node on which the AppMaster is running, I'm seeing 
> plenty of CPU idle time and free memory, yet there are still nodes with no 
> utilization (0 running containers).  The log indicates that the AppMaster has 
> way more memory (physical/virtual) than it appears to need with repeated log 
> messages like this:
>  
> 2016-05-25 13:59:04,615 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  (Container Monitor): Memory usage of ProcessTree 11265 for container-id 
> container_1464122327865_0002_01_000001: 1.6 GB of 6.3 GB physical memory 
> used; 6.1 GB of 31.3 GB virtual memory used
>  
> Can you please help me figure out where to go from here to troubleshoot, or 
> any other things to try?
>  
> Thanks!
> -Jeff

Re: YARN cluster underutilization

Reply via email to