Thanks Bikas for clearing that up. Much appreciated! Regards, Nitin
On Thu, May 5, 2016 at 1:21 PM, Bikas Saha <[email protected]> wrote: > Tez executor processes run 1 task at a time. While the inputs/outputs of > these tasks may have parallel threads, they are mostly doing IO. > Essentially the user code doing the processing is running on a single > thread. Hence giving more cores does not change much unless the user > processing code (ie. Hive operators in your case) can utilize that via > running multi-threaded CPU intensive code. Hence performance gains for such > (essentially single threaded apps) comes from task parallelism. > > DRF is a complex feature in YARN and with the primary design being ability > to properly share CPU intensive tasks with non-CPU intensive tasks, such > that CPU intensive tasks dont starve others out. > > Bikas > > ------------------------------ > Date: Thu, 5 May 2016 12:23:17 +0530 > Subject: Re: Varying vcores/ram for hive queries running Tez engine > From: [email protected] > To: [email protected] > > > Thanks Bikas and Hitesh for your inputs. > > I confirmed that hive.tez.cpu.vcores allocates desired number of vcores to > task containers. > > I carried out my bench-marking experiments and observed that increasing > the number of vcores allocated to a container did not have any noticeable > impact on the overall completion time of the query. > > I have attached an excel sheet that documents the running times. I have > also referenced the query > <https://gist.github.com/NitinKumar94/fbca5d56caa6c150eaa4c8528a63252c> I > used to benchmark. I have not done any optimization on the query side and > just wanted to observe the impact of changing container sizes and vcores. > > I was on the HDP forum and it was said that parallelism in tez in achieved > by means of individual tasks and increasing the cores would not help. > > Can you confirm this behavior? > > Thanks and regards, > Nitin > > On Thu, May 5, 2016 at 11:34 AM, Hitesh Shah <[email protected]> wrote: > > Bikas’ comment ( and mine below ) is relevant only for task specific > settings. Hive does not override any settings for the Tez AM so the tez > configs for the AM memory/vcores will reflect at runtime. > > I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which > may be why your setting for (3) is not taking effect. Additionally, Hive > also tends to fallback to MR based values if tez specific values are not > specified which might be something else you may wish to ask on the Hive > user list. > > thanks > — Hitesh > > > > On May 4, 2016, at 10:14 PM, Bikas Saha <[email protected]> wrote: > > > > IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive > config that is a proxy for 2). > > > > Bikas > > > > Date: Mon, 25 Apr 2016 13:57:38 +0530 > > Subject: Varying vcores/ram for hive queries running Tez engine > > From: [email protected] > > To: [email protected]; [email protected] > > > > I was trying to benchmark some hive queries. I am using the tez > execution engine. I varied the values of the following properties: > > • hive.tez.container.size > > • tez.task.resource.memory.mb > > • tez.task.resource.cpu.vcores > > Changes in values for property 1 is reflected properly. However it seems > that hive does not respect changes in values of property 3; it always > allocates one vcore per requested container (RM is configured to use the > DominantResourceCalculator). This got me thinking about the precedence of > property values in hive and tez. > > I have the following questions with respect to these configurations > > • Does hive respect the set values for the properties 2 and 3 at > all? > > • If I set property 1 to a value say 2048 MB and property 2 is set > to a value of say 1024 MB does this mean that I am wasting about a GB of > memory for each spawned container? > > • Is there a property in hive similar to property 1 that allows me > to use the 'set' command in the .hql file to specify the number of vcores > to use per container? > > • Changes in value for the property tez.am.resource.cpu.vcores are > reflected at runtime. However I do not observe the same behaviour with > property 3. Are there other configurations that take precedence over it? > > Your inputs and suggestions would be highly appreciated. > > > > Thanks! > > > > > > PS: Tests conducted on a 5 node cluster running HDP 2.3.0 > > >
