Tez executor processes run 1 task at a time. While the inputs/outputs of these 
tasks may have parallel threads, they are mostly doing IO. Essentially the user 
code doing the processing is running on a single thread. Hence giving more 
cores does not change much unless the user processing code (ie. Hive operators 
in your case) can utilize that via running multi-threaded CPU intensive code. 
Hence performance gains for such (essentially single threaded apps) comes from 
task parallelism.
DRF is a complex feature in YARN and with the primary design being ability to 
properly share CPU intensive tasks with non-CPU intensive tasks, such that CPU 
intensive tasks dont starve others out.
Bikas
Date: Thu, 5 May 2016 12:23:17 +0530
Subject: Re: Varying vcores/ram for hive queries running Tez engine
From: [email protected]
To: [email protected]

Thanks Bikas and Hitesh for your inputs.

I confirmed that hive.tez.cpu.vcores allocates desired number of vcores to task 
containers.

I carried out my bench-marking experiments and observed that increasing the 
number of vcores allocated to a container did not have any noticeable impact on 
the overall completion time of the query.

I have attached an excel sheet that documents the running times. I have 
also referenced the query I used to benchmark. I have not done any 
optimization on the query side and just wanted to observe the impact of 
changing container sizes and vcores.

I was on the HDP forum and it was said that parallelism in tez in achieved by 
means of individual tasks and increasing the cores would not help.

Can you confirm this behavior?

Thanks and regards,
Nitin 

On Thu, May 5, 2016 at 11:34 AM, Hitesh Shah <[email protected]> wrote:
Bikas’ comment ( and mine below ) is relevant only for task specific settings. 
Hive does not override any settings for the Tez AM so the tez configs for the 
AM memory/vcores will reflect at runtime.



I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which may be 
why your setting for (3) is not taking effect. Additionally, Hive also tends to 
fallback to MR based values if tez specific values are not specified which 
might be something else you may wish to ask on the Hive user list.



thanks

— Hitesh





> On May 4, 2016, at 10:14 PM, Bikas Saha <[email protected]> wrote:

>

> IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive config 
> that is a proxy for 2).

>

> Bikas

>

> Date: Mon, 25 Apr 2016 13:57:38 +0530

> Subject: Varying vcores/ram for hive queries running Tez engine

> From: [email protected]

> To: [email protected]; [email protected]

>

> I was trying to benchmark some hive queries. I am using the tez execution 
> engine. I varied the values of the following properties:

>       • hive.tez.container.size

>       • tez.task.resource.memory.mb

>       • tez.task.resource.cpu.vcores

> Changes in values for property 1 is reflected properly. However it seems that 
> hive does not respect changes in values of property 3; it always allocates 
> one vcore per requested container (RM is configured to use the 
> DominantResourceCalculator). This got me thinking about the precedence of 
> property values in hive and tez.

> I have the following questions with respect to these configurations

>       • Does hive respect the set values for the properties 2 and 3 at all?

>       • If I set property 1 to a value say 2048 MB and property 2 is set to a 
> value of say 1024 MB does this mean that I am wasting about a GB of memory 
> for each spawned container?

>       • Is there a property in hive similar to property 1 that allows me to 
> use the 'set' command in the .hql file to specify the number of vcores to use 
> per container?

>       • Changes in value for the property tez.am.resource.cpu.vcores are 
> reflected at runtime. However I do not observe the same behaviour with 
> property 3. Are there other configurations that take precedence over it?

> Your inputs and suggestions would be highly appreciated.

>

> Thanks!

>

>

> PS: Tests conducted on a 5 node cluster running HDP 2.3.0




                                          

Reply via email to