Thanks Bikas for clearing that up.
Much appreciated!

Regards,
Nitin

On Thu, May 5, 2016 at 1:21 PM, Bikas Saha <[email protected]> wrote:

> Tez executor processes run 1 task at a time. While the inputs/outputs of
> these tasks may have parallel threads, they are mostly doing IO.
> Essentially the user code doing the processing is running on a single
> thread. Hence giving more cores does not change much unless the user
> processing code (ie. Hive operators in your case) can utilize that via
> running multi-threaded CPU intensive code. Hence performance gains for such
> (essentially single threaded apps) comes from task parallelism.
>
> DRF is a complex feature in YARN and with the primary design being ability
> to properly share CPU intensive tasks with non-CPU intensive tasks, such
> that CPU intensive tasks dont starve others out.
>
> Bikas
>
> ------------------------------
> Date: Thu, 5 May 2016 12:23:17 +0530
> Subject: Re: Varying vcores/ram for hive queries running Tez engine
> From: [email protected]
> To: [email protected]
>
>
> Thanks Bikas and Hitesh for your inputs.
>
> I confirmed that hive.tez.cpu.vcores allocates desired number of vcores to
> task containers.
>
> I carried out my bench-marking experiments and observed that increasing
> the number of vcores allocated to a container did not have any noticeable
> impact on the overall completion time of the query.
>
> I have attached an excel sheet that documents the running times. I have
> also referenced the query
> <https://gist.github.com/NitinKumar94/fbca5d56caa6c150eaa4c8528a63252c> I
> used to benchmark. I have not done any optimization on the query side and
> just wanted to observe the impact of changing container sizes and vcores.
>
> I was on the HDP forum and it was said that parallelism in tez in achieved
> by means of individual tasks and increasing the cores would not help.
>
> Can you confirm this behavior?
>
> Thanks and regards,
> Nitin
>
> On Thu, May 5, 2016 at 11:34 AM, Hitesh Shah <[email protected]> wrote:
>
> Bikas’ comment ( and mine below ) is relevant only for task specific
> settings. Hive does not override any settings for the Tez AM so the tez
> configs for the AM memory/vcores will reflect at runtime.
>
> I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which
> may be why your setting for (3) is not taking effect. Additionally, Hive
> also tends to fallback to MR based values if tez specific values are not
> specified which might be something else you may wish to ask on the Hive
> user list.
>
> thanks
> — Hitesh
>
>
> > On May 4, 2016, at 10:14 PM, Bikas Saha <[email protected]> wrote:
> >
> > IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive
> config that is a proxy for 2).
> >
> > Bikas
> >
> > Date: Mon, 25 Apr 2016 13:57:38 +0530
> > Subject: Varying vcores/ram for hive queries running Tez engine
> > From: [email protected]
> > To: [email protected]; [email protected]
> >
> > I was trying to benchmark some hive queries. I am using the tez
> execution engine. I varied the values of the following properties:
> >       • hive.tez.container.size
> >       • tez.task.resource.memory.mb
> >       • tez.task.resource.cpu.vcores
> > Changes in values for property 1 is reflected properly. However it seems
> that hive does not respect changes in values of property 3; it always
> allocates one vcore per requested container (RM is configured to use the
> DominantResourceCalculator). This got me thinking about the precedence of
> property values in hive and tez.
> > I have the following questions with respect to these configurations
> >       • Does hive respect the set values for the properties 2 and 3 at
> all?
> >       • If I set property 1 to a value say 2048 MB and property 2 is set
> to a value of say 1024 MB does this mean that I am wasting about a GB of
> memory for each spawned container?
> >       • Is there a property in hive similar to property 1 that allows me
> to use the 'set' command in the .hql file to specify the number of vcores
> to use per container?
> >       • Changes in value for the property tez.am.resource.cpu.vcores are
> reflected at runtime. However I do not observe the same behaviour with
> property 3. Are there other configurations that take precedence over it?
> > Your inputs and suggestions would be highly appreciated.
> >
> > Thanks!
> >
> >
> > PS: Tests conducted on a 5 node cluster running HDP 2.3.0
>
>
>

Reply via email to