pig-0.17.0bin/pig -x local
very basic UDF file:
#!/usr/bin/python3
from pig_util import outputSchema
@outputSchema("as:int")
def square(num):
if num == None:
return None
return ((num) * (num))
@outputSchema("word:chararray")
def concat(word):
return word + word
Exceedingly simple pig script:
REGISTER '/home/scs/woodcock/SD411/lab_udf/test.py' USING
org.apache.pig.scripting.streaming.Python.PythonScriptEngine AS myFuncs;
A = LOAD '/home/scs/woodcock/SD411/DATA/accident.csv' USING PigStorage(',')
AS (state:int,name:chararray);
B = FOREACH A GENERATE myFuncs.square(state) AS state, name;
If I do a "DUMP A" I get exactly what I would expect.
But, on a "DUMP B", I get a failed job:
java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException:
LINE :
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE :
at
org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:506)
grunt> Exception in thread "Thread-82" java.lang.NullPointerException:
Cannot invoke "java.util.concurrent.BlockingQueue.put(Object)" because the
return value of
"org.apache.pig.impl.builtin.StreamingUDF.access$500(org.apache.pig.impl.builtin.StreamingUDF)"
is null
at
org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:471)
2024-10-29 13:02:15,296 [communication thread] INFO
org.apache.hadoop.mapred.LocalJobRunner - map > map
?