Hi, >From your SVG it looks like containers were re-used to run 2 waves of Map6 >tasks. So your question (re-phrased) is why did the job not end up getting >enough containers to run all the maps in 1 wave. Is that correct?
Tez does not limit the number of containers that is asks from the RM. So in this case, Tez would have asked for 1 container for each Map6 task. It has likely re-used containers for the second wave because it did not get containers allocated from the RM for those tasks. So it waited for some time for a container (as indicated by the gap in the swimlane between re-use) and then decided to reuse an existing container. Why did this happen 1) RM did not allocate containers due to queue limits or slowness in allocation cycle. 2) Tez scheduling has a bug (which we have not seen elsewhere) Could you please open a jira and attach the AM logs for the smaller application. We can easily verify 2. Thanks Bikas From: Xiaoyong Zhu [mailto:[email protected]] Sent: Thursday, September 10, 2015 10:26 PM To: [email protected] Subject: RE: how to allocate more containers? Thanks for the information. here's my understanding of the resource allocation (please correct me if I am wrong) and my scenario: 1. Assuming the cluster is dedicated to only one Tez application, then I want to maximize the usage of the single application (Mem/CPU) 2. Assuming I have changed all the configurations in YARN side so the memory/CPU allocation of a certain node is maximized (meaning each node can be theoretically full utilized). The input is around 500GB~1TB 3. Then I launched a Tez application (Hive on Tez). Tez will choose the number of tasks (in my case, there are usually 3K tasks), an each task usually run about 10~20 seconds. In this case, I don't think my Tez task should be increased (as each of them just run a couple of seconds so I think each task has the ability to process its data). The swimlane picture is attached (for a smaller data size but the DAG plans are the same). The container reuse switch is also on. In order to maximize the utilization, I would rather like to increase my container number so more tasks can be run in parallel, but I am not sure if Tez AM will ask RM for a certain amount of containers based on what? Can I change the container number Tez asks so the job could be run faster? Xiaoyong From: Jianfeng (Jeff) Zhang [mailto:[email protected]] Sent: Friday, September 11, 2015 1:19 PM To: [email protected]<mailto:[email protected]> Subject: Re: how to allocate more containers? by default I think container reuse is enabled. You may disable it to get more containers, but it also needs some trade-off and not use resource efficiently. Set tez.am.container.reuse.enabled = false Best Regard, Jeff Zhang From: Jianfeng Zhang <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Friday, September 11, 2015 at 12:52 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: how to allocate more containers? Resource usage is more related to your cluster configuration (the resource scheduler configuration) Do you intend to increase parallelism (more tasks ) to get more containers ? And there's some configurations that you can use to get containers more quickly with some other trade-off, but it would not give you more containers. Best Regard, Jeff Zhang From: Xiaoyong Zhu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Friday, September 11, 2015 at 12:38 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: how to allocate more containers? Hi I am wondering if there is a configuration I can change to allocate more containers for a certain Tez application? I am using Hive on Tez. Thanks! Xiaoyong
