Hi Friends
Thanks it worked now.I dont think we need to compose it to any functions as we
are making use of stdin read as given in examples over the ne
From: [email protected]
To: [email protected]
Subject: RE: Running python UDF in hive
Date: Thu, 20 Aug 2015 19:24:24 +0000
remember that transform scripts in hive should receive data from STDIN and
return results to STDOUT. So, to properly test your transform script try this:
hive -e "select id from test limit 10" > testout.txt
cat testout.txt | python transform_value.py
if your transform script is working correctly, it will produce 10 lines of
output.
If the above test works, but your query still fails, you should be sure to test
to make sure that the datanodes are properly configured to run your python
script.
From: Manjee, Sunile [mailto:[email protected]]
Sent: Thursday, August 20, 2015 8:28 AM
To: [email protected]
Subject: Re: Running python UDF in hive
Did you test your python script stand alone to verify it works as expected?
From:
rakesh sharma <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, August 20, 2015 at 7:55 AM
To: "[email protected]" <[email protected]>
Subject: Running python UDF in hive
Hi all
I am running a python UDF in hive. I am getting the following error.
hive> select transform(id) using 'python transform_value.py' as (id string)
from test;
Query ID = 19659_20150820175050_ccb3b5e2-7e45-44a6-b16f-c4a4ad59e8f2
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id
application_1435674747354_0320)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED 1 0 0 1 4 0
--------------------------------------------------------------------------------
VERTICES: 00/01 [>>--------------------------] 0% ELAPSED TIME: 11.06 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1435674747354_0320_1_00,
diagnostics=[Task failed, taskId=task_1435674747354_0320_1_00_000000,
diagnostics=[TaskAttempt
0 failed, info=[Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: Hive Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:333)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:177)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An
error occurred when trying to close the Operator running your custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:550)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:309)
... 14 more
], TaskAttempt 1 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error
while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:333)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:177)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An
error occurred when trying to close the Operator running your custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:550)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:309)
... 14 more
], TaskAttempt 2 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error
while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:333)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:177)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An
error occurred when trying to close the Operator running your custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:550)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:309)
... 14 more
], TaskAttempt 3 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error
while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:333)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:177)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An
error occurred when trying to close the Operator running your custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:550)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:309)
... 14 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex
vertex_1435674747354_0320_1_00 [Map 1] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask
hive> [Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: Hive Runtime Error while closing operators
>
I am writing a simple example, with a table with one column of int type and
the python script increments it.
Can some one point to the type of error here so that I can go and debug, I am
clueless.
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL
and may contain information that is privileged and exempt from disclosure under
applicable law. If you are neither the intended recipient nor responsible for
delivering the message to the intended recipient, please note that any
dissemination, distribution, copying or the taking of any action in reliance
upon the message is strictly prohibited. If you have received this
communication in error, please notify the sender immediately. Thank you.