Hi Xin, Can you provide these:
1. Output of explain plan 2. Output of set –v (this will list the configs, so you might want to anonymize these) In addition to that, it looks like vertex vertex_1495595408051_21107_2_03 failed with OOM. Using Tez counters you can find out the amount of data input to this vertex which can further help you in narrowing down the root cause. Hope this helps, —Vaibhav From: <Yang>, Xin <xiy...@visa.com<mailto:xiy...@visa.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Thursday, July 6, 2017 at 10:37 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Re: Tez query failed with OutOfMemoryError: Java heap space Here're the version information: Hive: 1.2.1 Tez: 0.8.5 Hadoop 2.6.0-cdh5.8.3 Please let me know if you need more information. Regards, Xin From: "Yang, Xin" <xiy...@visa.com<mailto:xiy...@visa.com>> Date: Thursday, June 29, 2017 at 11:48 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Tez query failed with OutOfMemoryError: Java heap space Hi, We ran a Tez query and it failed with OOM. Then, we computed stats, it still failed with the OOM. Settings: set hive.tez.container.size=4096; set tez.am.resource.memory.mb=1024; set hive.tez.java.opts=-Xmx3276m; set hive.tez.dynamic.partition.pruning=false; set hive.tez.dynamic.partition.pruning.max.event.size=1048576; set hive.tez.dynamic.partition.pruning.max.data.size=104857600; set hive.prewarm.enabled=true; set hive.prewarm.numcontainers=10; set tez.am.container.reuse.enabled=true; set hive.cbo.enable=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=20971520; set hive.mapjoin.hybridgrace.hashtable=false; set hive.optimize.bucketmapjoin.sortedmerge=false; set hive.map.aggr.hash.percentmemory=0.5; set hive.map.aggr=true; set hive.vectorized.execution.enabled=false; set hive.vectorized.execution.reduce.enabled=false; set hive.vectorized.execution.reduce.groupby.enabled=false; set hive.exec.parallel=true; set hive.exec.parallel.thread.number=16; set hive.exec.reducers.max=800; set hive.optimize.reducededuplication=true; set hive.optimize.reducededuplication.min.reducer=4; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=false; set hive.merge.smallfiles.avgsize=16000000; set hive.merge.size.per.task=256000000; set hive.smbjoin.cache.rows=10000; set hive.fetch.task.conversion=more; set hive.optimize.sort.dynamic.partition=true; set hive.tez.auto.reducer.parallelism=true; Stacktrace: Status: Failed Vertex failed, vertexName=Map 3, vertexId=vertex_1495595408051_21107_2_03, diagnostics=[Task failed, taskId=task_1495595408051_21107_2_03_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: exceptio nThrown=java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46) at org.apache.tez.runtime.library.common.shuffle.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38) at org.apache.tez.runtime.library.common.shuffle.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:141) at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:717) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:489) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:398) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:195) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:70) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) , errorMessage=Fetch failed:java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46) at org.apache.tez.runtime.library.common.shuffle.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38) at org.apache.tez.runtime.library.common.shuffle.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:141) at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:717) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:489) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:398) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:195) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:70) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:388) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:378) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:214) ... 15 more Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:386) ... 20 more Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:241) at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:217) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeKey(MapJoinBytesTableContainer.java:235) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:445) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:365) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:191) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:288) at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:173) at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:169) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:92) ... 4 more ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 14 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1495595408051_21107_2_03 [Map 3] killed/failed due to:null]Vertex killed, vertexName=Reducer 7, vertexId=ve rtex_1495595408051_21107_2_06, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:2, Vertex vertex_1495595408051_211 07_2_06 [Reducer 7] killed/failed due to:null]Vertex killed, vertexName=Map 6, vertexId=vertex_1495595408051_21107_2_05, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1495595408051_21107_2_05 [Map 6] killed/failed due to:null]Vertex killed, vertexName=Map 5, vertexId=vertex_1495595408051_21107_2 _04, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1495595408051_21107_2_04 [Map 5] killed/fai led due to:null]Vertex killed, vertexName=Map 1, vertexId=vertex_1495595408051_21107_2_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:41, Vertex vertex_1495595408051_21107_2_02 [Map 1] killed/failed due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:4 Please take a look. Thanks. Regards, Xin