Just FYI, I'm able to make a custom UDF to apply the thread-safe code changes.
Thanks a lot for your help Guizhou ________________________________ From: Proust (Feng Guizhou) [FDS Payment and Marketing] <pf...@coupang.com> Sent: Tuesday, July 24, 2018 4:34:49 PM To: user@hive.apache.org Subject: Re: UDFJson cannot Make Progress and Looks Like Deadlock Thanks a lot for pointing this out, it makes the problem clear. For a quick workaround and low cost without upgrading, I'm considering to reimplement the UDF get_json_object to a new name to avoid the problem. Thanks Guizhou ________________________________ From: Peter Vary <pv...@cloudera.com> Sent: Tuesday, July 24, 2018 4:24:12 PM To: user@hive.apache.org Subject: Re: UDFJson cannot Make Progress and Looks Like Deadlock Hi Guizhou, I would guess, that this is caused by: * HIVE-16196<https://issues.apache.org/jira/browse/HIVE-16196> UDFJson having thread-safety issues Try to upgrade to a CDH version where this patch is already included (5.12.0 or later) Regards, Peter On Jul 24, 2018, at 10:15, Proust (Feng Guizhou) [FDS Payment and Marketing] <pf...@coupang.com<mailto:pf...@coupang.com>> wrote: Hi, Hive Community We are running Hive on Spark with CDH Cluster: Apache Hive (version 1.1.0-cdh5.10.1) Sometimes(High Frequency) Hive Query could hang and does not make progress within UDFJson.evaluate An example Executor thread dump looks below 3 threads hang within java.util.HashMap$TreeNode.balanceInsertion(HashMap.java:2229) 1 thread hang within java.util.HashMap$TreeNode.find(HashMap.java:1873) I cannot find out any existing Jira issues related to this problem, here I'm looking for if any workaround or any solution if anyone already encounter and solved such problem. Personally I doubt that it may looks like a concurrent issue on HashMap. Detail Thread Dump: java.util.HashMap$TreeNode.balanceInsertion(HashMap.java:2229) java.util.HashMap$TreeNode.treeify(HashMap.java:1938) java.util.HashMap$TreeNode.split(HashMap.java:2161) java.util.HashMap.resize(HashMap.java:713) java.util.HashMap.putVal(HashMap.java:662) java.util.HashMap.put(HashMap.java:611) org.apache.hadoop.hive.ql.udf.UDFJson.evaluate(UDFJson.java:151) sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:498) org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965) org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182) org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120) org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97) org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:89) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) java.util.HashMap$TreeNode.find(HashMap.java:1873) java.util.HashMap$TreeNode.getTreeNode(HashMap.java:1881) java.util.HashMap.getNode(HashMap.java:575) java.util.LinkedHashMap.get(LinkedHashMap.java:440) org.apache.hadoop.hive.ql.udf.UDFJson.evaluate(UDFJson.java:144) sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:498) org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965) org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182) org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120) org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97) org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:89) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) Thanks Guizhou