Thanks Bikas and Hitesh for your inputs.

I confirmed that hive.tez.cpu.vcores allocates desired number of vcores to
task containers.

I carried out my bench-marking experiments and observed that increasing the
number of vcores allocated to a container did not have any noticeable
impact on the overall completion time of the query.

I have attached an excel sheet that documents the running times. I have
also referenced the query
<https://gist.github.com/NitinKumar94/fbca5d56caa6c150eaa4c8528a63252c> I
used to benchmark. I have not done any optimization on the query side and
just wanted to observe the impact of changing container sizes and vcores.

I was on the HDP forum and it was said that parallelism in tez in achieved
by means of individual tasks and increasing the cores would not help.

Can you confirm this behavior?

Thanks and regards,
Nitin

On Thu, May 5, 2016 at 11:34 AM, Hitesh Shah <[email protected]> wrote:

> Bikas’ comment ( and mine below ) is relevant only for task specific
> settings. Hive does not override any settings for the Tez AM so the tez
> configs for the AM memory/vcores will reflect at runtime.
>
> I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which
> may be why your setting for (3) is not taking effect. Additionally, Hive
> also tends to fallback to MR based values if tez specific values are not
> specified which might be something else you may wish to ask on the Hive
> user list.
>
> thanks
> — Hitesh
>
>
> > On May 4, 2016, at 10:14 PM, Bikas Saha <[email protected]> wrote:
> >
> > IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive
> config that is a proxy for 2).
> >
> > Bikas
> >
> > Date: Mon, 25 Apr 2016 13:57:38 +0530
> > Subject: Varying vcores/ram for hive queries running Tez engine
> > From: [email protected]
> > To: [email protected]; [email protected]
> >
> > I was trying to benchmark some hive queries. I am using the tez
> execution engine. I varied the values of the following properties:
> >       • hive.tez.container.size
> >       • tez.task.resource.memory.mb
> >       • tez.task.resource.cpu.vcores
> > Changes in values for property 1 is reflected properly. However it seems
> that hive does not respect changes in values of property 3; it always
> allocates one vcore per requested container (RM is configured to use the
> DominantResourceCalculator). This got me thinking about the precedence of
> property values in hive and tez.
> > I have the following questions with respect to these configurations
> >       • Does hive respect the set values for the properties 2 and 3 at
> all?
> >       • If I set property 1 to a value say 2048 MB and property 2 is set
> to a value of say 1024 MB does this mean that I am wasting about a GB of
> memory for each spawned container?
> >       • Is there a property in hive similar to property 1 that allows me
> to use the 'set' command in the .hql file to specify the number of vcores
> to use per container?
> >       • Changes in value for the property tez.am.resource.cpu.vcores are
> reflected at runtime. However I do not observe the same behaviour with
> property 3. Are there other configurations that take precedence over it?
> > Your inputs and suggestions would be highly appreciated.
> >
> > Thanks!
> >
> >
> > PS: Tests conducted on a 5 node cluster running HDP 2.3.0
>
>

Attachment: benchmark_results.xls
Description: MS-Excel spreadsheet

Reply via email to