To change Tez memory footprints through Hive, you need to set the following
configuration parameters:
l SET hive.tez.container.size=<numerical memory value> Sets the size of the
container spawned by YARN.
l SET hive.tez.java.opts=-Xmx<numerical max heap size> Java command line
options for Tez.
For example:
SET hive.tez.container.size=5120
SET hive.tez.java.opts=-Xmx4096m
"hive.tez.container.size" and "hive.tez.java.opts" are the parameters that
alter Tez memory settings in Hive. If
"hive.tez.container.size" is set to "-1" (default value), it picks the value of
"mapreduce.map.memory.mb". If
"hive.tez.java.opts" is not specified, it relies on the
"mapreduce.map.java.opts". Thus, if Tez specific memory settings are
left as default values, memory sizes are picked from mapreduce mapper memory
settings "mapreduce.map.memory.mb".
Please note that the setting for "hive.tez.java.opts" must be smaller than the
size specified for "hive.tez.container.size",
or "mapreduce.{map|reduce}.memory.mb" if "hive.tez.container.size" is not
specified. Don't forget to review both of them
when setting both or either one of them to ensure "hive.tez.java.opts" is
smaller then "hive.tez.container.size" or
"mapreduce.{map|reduce}.java.opts" is smaller then
"mapreduce.{map|reduce}.memory.mb".
[email protected]
From: Juho Autio
Date: 2015-10-26 18:46
To: user
Subject: Re: Constant Full GC making Tez Hive job take almost forever
Thanks Jeff!
I did as you suggested and the job went through in a reasonable time. However
the performance is not perfect. With mr engine the performance was better.
I didn't set hive.tez.container.size before. The default seems to be -1. I
wonder what that means in practice. Any way seems like quite many containers
were spawned, so it must be using something rather small value.
SET hive.tez.container.size;
hive.tez.container.size=-1
Also I was running with this default value:
hive> SET hive.auto.convert.join.noconditionaltask.size;
hive.auto.convert.join.noconditionaltask.size=10000000
So I changed them to*:
SET hive.tez.container.size=4096;
SET hive.auto.convert.join.noconditionaltask.size=1252698795;
This seems to mean that tez container size needs to be configured depending on
yarn container configuration (so basically need to provide different values
depending on the cluster size). If the relationship is really so strict,
couldn't Tez pick a suitable container size automatically to avoid running out
of memory?
Any way, thanks a lot for solving my problem!
Juho
*) P.S. I tried to follow this guide basically:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_best_pract_config_tez.html
Do you know if it's with some special hive release that you can specify 4096MB
(and ..noconditionaltask.size=1370MB), because for me Hive didn't accept MB
suffix.
Also this seems to be more or less the only document I've found about tuning
Tez for Hive. I wonder if there are some other options and if I could find
extensive documentation somewhere.
On Fri, Oct 23, 2015 at 11:25 AM, Jianfeng (Jeff) Zhang
<[email protected]> wrote:
Maybe your tez container size is less than container of mr. You can try
increasing the container size of tez (You can check the container size of your
job in RM WebUI)
e.g. Set hive.tez.container.size=4096
BTW, not sure your cluster environment. I meet the similar issue under the
cluster managed by ambari. By default in ambari the hive container size is
less than the container size of reducer in MR.
This may cause performance issue in the reduce stage when input data size is
much large.
Best Regard,
Jeff Zhang
From: Juho Autio <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, October 23, 2015 at 4:08 PM
To: "[email protected]" <[email protected]>
Subject: Constant Full GC making Tez Hive job take almost forever
Hi,
I'm running a Hive script with tez-0.7.0. The progress is real slow and in the
container logs I'm seeing constant Full GC lines, so that there doesn't seem to
be no time for the JVM to actually execute anything between the GC pauses.
When running the same Hive script with mr execution engine, the job goes
through normally.
So there's something specific to Tez's memory usage that causes the Full GC
issue.
Also with similar clusters & configuration other Hive jobs have gone through
with Tez just fine. This issue happens when I just add a little more data to be
processed by the script. With a smaller workload it goes through also with Tez
engine with the expected execution time.
For example an extract from one of the container logs:
application_1445328511212_0001/container_1445328511212_0001_01_000292/stdout.gz
791.208: [Full GC
[PSYoungGen: 58368K->56830K(116736K)]
[ParOldGen: 348914K->348909K(349184K)]
407282K->405740K(465920K)
[PSPermGen: 43413K->43413K(43520K)], 1.4063790 secs] [Times: user=5.22
sys=0.04, real=1.40 secs]
Heap
PSYoungGen total 116736K, used 58000K [0x00000000f5500000,
0x0000000100000000, 0x0000000100000000)
eden space 58368K, 99% used
[0x00000000f5500000,0x00000000f8da41a0,0x00000000f8e00000)
from space 58368K, 0% used
[0x00000000f8e00000,0x00000000f8e00000,0x00000000fc700000)
to space 58368K, 0% used
[0x00000000fc700000,0x00000000fc700000,0x0000000100000000)
ParOldGen total 349184K, used 348909K [0x00000000e0000000,
0x00000000f5500000, 0x00000000f5500000)
object space 349184K, 99% used
[0x00000000e0000000,0x00000000f54bb4b0,0x00000000f5500000)
PSPermGen total 43520K, used 43413K [0x00000000d5a00000,
0x00000000d8480000, 0x00000000e0000000)
object space 43520K, 99% used
[0x00000000d5a00000,0x00000000d84657a8,0x00000000d8480000)
If I understand the GC log correctly, it seems like ParOldGen is full and Full
GC doesn't manage to free space from there. So maybe Tez has created too many
objects that can't be released. It could be a memory leak. Or maybe this is
just not big enough minimum heap for Tez in general? I could probably fix the
problem by changing configuration somehow to simply have less containers and
thus bigger heap size per container? Still, changing to bigger nodes doesn't
seem like a solution that would eventually scale, so I would prefer to resolve
this properly.
Please, could you help me with how to troubleshoot & fix this issue?
Cheers,
Juho
--
Juho Autio
Analytics Developer
Hatch
Rovio Entertainment Ltd
Mobile: + 358 (0)45 313 0122
[email protected]
www.rovio.com
This message and its attachments may contain confidential information and is
intended solely for the attention and use of the named addressee(s). If you are
not the intended recipient and / or you have received this message in error,
please contact the sender immediately and delete all material you have received
in this message. You are hereby notified that any use of the information, which
you have received in error in whatsoever form, is strictly prohibited. Thank
you for your co-operation.