Hey Thats pretty good. So by changing the file split size, the number of Maps running reduced??
-Shubh > On May 27, 2016, at 4:01 PM, Guttadauro, Jeff <jeff.guttada...@here.com> > wrote: > > Hi, all. > > Just wanted to provide an update, which is that I’m finally getting good YARN > cluster utilization (consistently within the 90-100% range!). I believe the > biggest change was to increase the min split size. Since our input is all in > S3 and data locality is not really an issue, I bumped it up to 2G to minimize > the impact of allocation/deallocation of container resources, since each > container will be up working for longer, so that now occurs less frequently. > > > <property><name>mapreduce.input.fileinputformat.split.minsize</name><value>2147483648</value><!-- > 2G --></property> > > Not sure how much impact the following changes had, since they were made at > the same time. Everything’s humming along now though, so I’m going to leave > them. > > I also reduced the node heartbeat interval from 1000ms down to 500ms > ("yarn.resourcemanager.nodemanagers.heartbeat-interval-ms": "500" in cluster > configuration JSON), since I’m told that NodeManager will only allocate 1 > container per node per heartbeat when dealing with non-localized data, like > we are since it’s in S3. I also doubled the memory given to the YARN > Resource Manager from the default for the m3.xlarge node type I’m using > ("YARN_RESOURCEMANAGER_HEAPSIZE": "5120" in cluster configuration JSON). > > Thanks again to Sunil and Shubh (and my colleague, York) for the helpful > guidance! > > Take care, > -Jeff > > From: Shubh hadoopExp [mailto:shubhhadoop...@gmail.com] > Sent: Wednesday, May 25, 2016 11:08 PM > To: Guttadauro, Jeff <jeff.guttada...@here.com> > Cc: Sunil Govind <sunil.gov...@gmail.com>; user@hadoop.apache.org > Subject: Re: YARN cluster underutilization > > Hey, > > OFFSWITCH allocation means if the data locality is maintained or not. It has > no relation with heartbeat! Heartbeat is just used to clear the pipelining of > Container request. > > -Shubh > > > On May 25, 2016, at 3:30 PM, Guttadauro, Jeff <jeff.guttada...@here.com > <mailto:jeff.guttada...@here.com>> wrote: > > Interesting stuff! I did not know about this handling of OFFSWITCH requests. > > To get around this, would you recommend reducing the heartbeat interval, > perhaps to 250ms to get a 4x improvement in container allocation rate (or is > it not quite as simple as that)? Maybe doing this in combination with using > a greater number of smaller nodes would help? Would overloading the > ResourceManager be a concern if doing that? Should I bump up the > “YARN_RESOURCEMANAGER_HEAPSIZE” configuration property (current default for > m3.xlarge is 2396M), or would you suggest any other knobs to turn to help RM > handle it? > > Thanks again for all your help, Sunil! > > From: Sunil Govind [mailto:sunil.gov...@gmail.com > <mailto:sunil.gov...@gmail.com>] > Sent: Wednesday, May 25, 2016 1:07 PM > To: Guttadauro, Jeff <jeff.guttada...@here.com > <mailto:jeff.guttada...@here.com>>; user@hadoop.apache.org > <mailto:user@hadoop.apache.org> > Subject: Re: YARN cluster underutilization > > Hi Jeff, > > I do see the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms > property set to 1000 in the job configuration > >> Ok, This make sense.. node heartbeat seems default. > > If there are no locality specified in resource requests (using > ResourceRequest.ANY) , then YARN will allocate only one container per node > heartbeat. So your container allocation rate is slower considering 600k > requests and only 20 nodes. And if more number of containers are also getting > released fast (I could see that some containers lifetime is 80 to 90 secs), > then this will become more complex and container allocation rate will be > slower. > > YARN-4963 <https://issues.apache.org/jira/browse/YARN-4963> is trying to make > more allocation per heartbeat for NODE_OFFSWITCH (ANY) requests. But its not > yet available in any release. > > I guess you can investigate more in this line to confirm this points. > > Thanks > Sunil > > > On Wed, May 25, 2016 at 11:00 PM Guttadauro, Jeff <jeff.guttada...@here.com > <mailto:jeff.guttada...@here.com>> wrote: > Thanks for digging into the log, Sunil, and making some interesting > observations! > > The heartbeat interval hasn’t been changed from its default, and I do see the > yarn.resourcemanager.nodemanagers.heartbeat-interval-ms property set to 1000 > in the job configuration. I was searching in the log for heartbeat interval > information, but I didn’t find anything. Where do you look in the log for > the heartbeats? > > Also, you are correct about there being no data locality, as all the input > data is in S3. The utilization has been fluctuating, but I can’t really see > a pattern or tell why. It actually started out pretty low in the 20-30% > range and then managed to get up into the 50-70% range after a while, but > that was short-lived, as it went back down into the 20-30% range for quite a > while. While writing this, I saw it surprisingly hit 80%!! First time I’ve > seen it that high in the 20 hours it’s been running… Although looks like it > may be headed back down. I’m perplexed. Wouldn’t you generally expect > fairly stable utilization over the course of the job? (This is the only job > running.) > > Thanks, > -Jeff > > From: Sunil Govind [mailto:sunil.gov...@gmail.com > <mailto:sunil.gov...@gmail.com>] > Sent: Wednesday, May 25, 2016 11:55 AM > > To: Guttadauro, Jeff <jeff.guttada...@here.com > <mailto:jeff.guttada...@here.com>>; user@hadoop.apache.org > <mailto:user@hadoop.apache.org> > Subject: Re: YARN cluster underutilization > > Hi Jeff. > > Thanks for sharing this information. I have some observations from this logs. > > - I think the node heartbeat is around 2/3 seconds here. Is it changed due to > some other reasons? > - And all mappers Resource Request seems to be asking for type ANY (there is > no data locality). pls correct me if I am wrong. > > If the resource request type is ANY, only one container will be allocated per > heartbeat for a node. Here node heartbeat delay is also more. And I can see > that containers are released very fast too. So when u started you > application, are you seeing more better resource utilization? And once > containers started to get released/completed, you are seeing under > utilization. > > Pls look into this line. It may be a reason. > > Thanks > Sunil > > On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff <jeff.guttada...@here.com > <mailto:jeff.guttada...@here.com>> wrote: > Thanks for your thoughts thus far, Sunil. Most grateful for any additional > help you or others can offer. To answer your questions, > > 1. This is a custom M/R job, which uses mappers only (no reduce phase) > to process GPS probe data and filter based on inclusion within a provided > polygon. There is actually a lot of upfront work done in the driver to make > that task as simple as can be (identifies a list of tiles that are completely > inside the polygon and those that fall across an edge, for which more > processing would be needed), but the job would still be more > compute-intensive than wordcount, for example. > > 2. I’m running almost 84k mappers for this job. This is actually down > from ~600k mappers, since one other thing I’ve done is increased the > mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the > job. Data is in S3, so loss of locality isn’t really a concern. > > 3. For NodeManager configuration, I’m using EMR’s default configuration > for the m3.xlarge instance type, which is > yarn.scheduler.minimum-allocation-mb=32, > yarn.scheduler.maximum-allocation-mb=11520, and > yarn.nodemanager.resource.memory-mb=11520. YARN dashboard shows min/max > allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>. > > 4. Capacity Scheduler [MEMORY] > > 5. I’ve attached 2500 lines from the RM log. Happy to grab more, but > they are pretty big, and I thought that might be sufficient. > > Any guidance is much appreciated! > -Jeff > > From: Sunil Govind [mailto:sunil.gov...@gmail.com > <mailto:sunil.gov...@gmail.com>] > Sent: Wednesday, May 25, 2016 10:55 AM > To: Guttadauro, Jeff <jeff.guttada...@here.com > <mailto:jeff.guttada...@here.com>>; user@hadoop.apache.org > <mailto:user@hadoop.apache.org> > Subject: Re: YARN cluster underutilization > > Hi Jeff, > > It looks like to you are allocating more memory for AM container. Mostly you > might not need 6Gb (as per the log). Could you please help to provide some > more information. > > 1. What type of mapreduce application (wordcount etc) are you running? Some > AMs may be CPU intensive and some may not be. So based on the type > application, memory/cpu can be tuned for better utilization. > 2. How many mappers (reducers) are you trying to run here? > 3. You have mentioned that each node has 8 cores and 15GB, but how much is > actually configured for NM? > 4. Which scheduler are you using? > 5. Its better to attach RM log if possible. > > Thanks > Sunil > > On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff <jeff.guttada...@here.com > <mailto:jeff.guttada...@here.com>> wrote: > Hi, all. > > I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN cluster > that is being quite underutilized (utilization of around 25-30%). The EMR > cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores each and > 15G total memory (with 11.25G of that available to YARN). I’ve configured > mapper memory with the following properties, which should allow for 8 > containers running map tasks per node: > > <property><name>mapreduce.map.memory.mb</name><value>1440</value></property> > <!-- Container size --> > <property><name>mapreduce.map.java.opts</name><value>-Xmx1024m</value></property> > <!-- JVM arguments for a Map task --> > > It was suggested that perhaps my AppMaster was having trouble keeping up with > creating all the mapper containers and that I bulk up its resource > allocation. So I did, as shown below, providing it 6G container memory (5G > task memory), 3 cores, and 60 task listener threads. > > <property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>60</value></property> > <!-- App Master task listener threads --> > <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>3</value></property> > <!-- App Master container vcores --> > <property><name>yarn.app.mapreduce.am.resource.mb</name><value>6400</value></property> > <!-- App Master container size --> > <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx5120m</value></property> > <!-- JVM arguments for each Application Master --> > > Taking a look at the node on which the AppMaster is running, I'm seeing > plenty of CPU idle time and free memory, yet there are still nodes with no > utilization (0 running containers). The log indicates that the AppMaster has > way more memory (physical/virtual) than it appears to need with repeated log > messages like this: > > 2016-05-25 13:59:04,615 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > (Container Monitor): Memory usage of ProcessTree 11265 for container-id > container_1464122327865_0002_01_000001: 1.6 GB of 6.3 GB physical memory > used; 6.1 GB of 31.3 GB virtual memory used > > Can you please help me figure out where to go from here to troubleshoot, or > any other things to try? > > Thanks! > -Jeff